roughset & it’s variants
TRANSCRIPT
ROUGH SET & IT’S VARIANTS: VARIABLE
PRICISION ROUGH SET AND FUZZY ROUGH
SET APPROACHES
presented by-
Rajdeep Chatterjee
PLP, MIU ISI Kolkata
Overview
Introduction to Rough Set
Information/Decision Systems
Indiscernibility
Set Approximations of Rough Set
Reducts and Core
Dependency of Attributes
Variable Precision rough Set (VPRS)
Set Approximations (VPRS)
Fuzzy Rough Set (FRS)
Set Approximations and Dependency (FRS)
Observations
R Chatterjee, PLP, MIU ISI Kolkata
Introduction
Often, information on the surrounding
world is
◦ Imprecise
◦ Incomplete
◦ uncertain.
We should be able to process uncertain
and/or incomplete information.
R Chatterjee, PLP, MIU ISI Kolkata
Introduction
“Rough set theory” was developed by
Zdzislaw Pawlak in the early 1980’s.
Representative Publications:
◦ Z. Pawlak, “Rough Sets”, International Journal of
Computer and Information Sciences, Vol.11, 341-
356 (1982).
◦ Z. Pawlak, Rough Sets - Theoretical Aspect of
Reasoning about Data, Kluwer Academic
Pubilishers (1991).
R Chatterjee, PLP, MIU ISI Kolkata
Information Systems
Age LEMS
X1 16-30 50
X2 16-30 0
X3 31-45 1-25
X4 31-45 1-25
X5 46-60 26-49
X6 16-30 26-49
X7 46-60 26-49
IS is a pair (U, A)
U is a non-empty
finite set of objects.
A is a non-empty
finite set of
attributes such that
for every
is called the value
set of a.aV
R Chatterjee, PLP, MIU ISI Kolkata
Decision Systems
Age LEMS Walk
X1 16-30 50 Yes
X2 16-30 0 No
X3 31-45 1-25 No
X4 31-45 1-25 Yes
X5 46-60 26-49 No
X6 16-30 26-49 Yes
X7 46-60 26-49 No
DS:
is the decision
attribute (instead of
one we can consider
more decision
attributes).
The elements of A
are called the
condition attributes.
}){,( dAUT
Ad
Condition attributes
Decision attribute
R Chatterjee, PLP, MIU ISI Kolkata
Indiscernibility
The equivalence relation
A binary relation which is
reflexive (xRx for any object x) ,
symmetric (if xRy then yRx), and
transitive (if xRy and yRz then xRz).
The equivalence class of an element
consists of all objects
such that xRy.
XXR
Rx][
Xx
Xy
R Chatterjee, PLP, MIU ISI Kolkata
Indiscernibility
Let IS = (U, A) be an information system, then with any there is an associated equivalence relation:
where is called the B-indiscernibility relation.
If then objects x and x’ are indiscernible from each other by attributes from B.
The equivalence classes of the B-indiscernibilityrelation are denoted by
AB
)}'()(,|)',{()( 2 xaxaBaUxxBINDIS
)(BINDIS
),()',( BINDxx IS
Rx][
R Chatterjee, PLP, MIU ISI Kolkata
An Example of Indiscernibility
Age LEMS Walk
X1 16-30 50 Yes
X2 16-30 0 No
X3 31-45 1-25 No
X4 31-45 1-25 Yes
X5 46-60 26-49 No
X6 16-30 26-49 Yes
X7 46-60 26-49 No
The non-empty subsets of the condition attributes are {Age}, {LEMS}, and {Age, LEMS}.
IND({Age}) = {{x1,x2,x6}, {x3,x4}, {x5,x7}}
IND({LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x6,x7}}
IND({Age,LEMS}) = {{x1}, {x2}, {x3,x4}, {x5,x7}, {x6}}.
R Chatterjee, PLP, MIU ISI Kolkata
Set Approximation
Let T = (U,A) and let and
We can approximate X using only the
information contained in B by
constructing the B-lower and B-upper
approximations of X, denoted and
respectively, where
AB .UX
XB
XB
},][|{ XxxXB B
}.][|{ XxxXB B
R Chatterjee, PLP, MIU ISI Kolkata
Set Approximation
B-boundary region of X,
consists of those objects that we cannot
decisively classify into X in B.
B-outside region of X,
consists of those objects that can be with
certainty classified as not belonging to X.
A set is said to be “rough” if its boundary
region is non-empty, otherwise the set is crisp.
,)( XBXBXBNB
,XBU
R Chatterjee, PLP, MIU ISI Kolkata
An Example of Set Approximation
Let W = {x | Walk(x) = yes}.
The decision class, Walk, is rough since the boundary region is not empty
}.7,5,2{
},4,3{)(
},6,4,3,1{
},6,1{
xxxWAU
xxWBN
xxxxWA
xxWA
A
Age LEMS Walk
X1 16-30 50 Yes
X2 16-30 0 No
X3 31-45 1-25 No
X4 31-45 1-25 Yes
X5 46-60 26-49 No
X6 16-30 26-49 Yes
X7 46-60 26-49 No
R Chatterjee, PLP, MIU ISI Kolkata
An Pictorial Depiction of Set Approximation
yes
yes/no
no
{{x1},{x6}}
{{x3,x4}}
{{x2}, {x5,x7}}
WA
AW
R Chatterjee, PLP, MIU ISI Kolkata
Lower & Upper Approximations
}:/{ XYRUYXR
}:/{ XYRUYXR
Lower Approximation:
Upper Approximation:
R Chatterjee, PLP, MIU ISI Kolkata
Properties of Approximations
1.
2.
3.
4.
5.
6. YX
YX
YBXBYXB
YBXBYXB
UUBUBBB
XBXXB
)()()(
)()()(
)()( ,)()(
)(
)()( YBXB
)()( YBXB
implies
implies
R Chatterjee, PLP, MIU ISI Kolkata
Properties of Approximations
7.
8.
9.
10.
11.
12. )())(())((
)())(())((
)()(
)()(
)()()(
)()()(
XBXBBXBB
XBXBBXBB
XBXB
XBXB
YBXBYXB
YBXBYXB
Where, -X denotes U - X.R Chatterjee, PLP, MIU ISI Kolkata
Four Basic Classes of Rough Sets
X is roughly B-definable, iff and
X is internally B-undefinable, iffand
X is externally B-undefinable, iff
and
X is totally B-undefinable, iff
and
)(XB
,)( UXB
)(XB
,)( UXB
)(XB
,)( UXB
)(XB
.)( UXB
R Chatterjee, PLP, MIU ISI Kolkata
Accuracy of Approximation
where |X| denotes the cardinality of
Obviously
If X is crisp with respect to B.
If X is rough with respect to B.
|)(|
|)(|)(
XB
XBXB
.X
.10 B
,1)( XB
,1)( XB
R Chatterjee, PLP, MIU ISI Kolkata
Issues in the Decision Table
The same or indiscernible objects may be
represented several times.
Some of the attributes may be
superfluous (redundant).
That is, their removal cannot worsen the
classification.
R Chatterjee, PLP, MIU ISI Kolkata
Reduct
Keep only those attributes that preserve
the indiscernibility relation and,
consequently, set approximation.
There are usually several such subsets of
attributes and those which are minimal
are called reducts.
R Chatterjee, PLP, MIU ISI Kolkata
Dispensable & Indispensable
AttributesLet Attribute c is dispensable in T
if , otherwise
attribute c is indispensable in T.
.Cc
)()( }){( DPOSDPOS cCC
XCDPOSDUX
C /
)(
The C-positive region of D:
R Chatterjee, PLP, MIU ISI Kolkata
Independent
T = (U, C, D) is independent
if all are indispensable in TCc
R Chatterjee, PLP, MIU ISI Kolkata
Reduct & Core
The set of attributes is called a
reduct of C, if T’ = (U, R, D) is independent
and
The set of all the condition attributes
indispensable in T is denoted by CORE(C).
where RED(C) is the set of all reducts of C.
CR
).()( DPOSDPOS CR
)()( CREDCCORE
R Chatterjee, PLP, MIU ISI Kolkata
Discernibility Matrix
Let T = (U, C, D) be a decision table, with
By a discernibility matrix of T, denoted M(T), we will mean matrix defined as:
for i, j = 1,2,…,n
is the set of all the condition attributes that classify objects ui and uj into different classes
}.,...,,{ 21 nuuuU
ijc
R Chatterjee, PLP, MIU ISI Kolkata
Discernibility Function
A discernibility function for an information
system IS is a boolean function om m boolean
variables (corresponding to the attributes
a1,a2,…,am) defined as follows.
Where .The set of all prime
implicants of determines the set of all reduct
of IS.
R Chatterjee, PLP, MIU ISI Kolkata
Examples of Discernibility Matrix
a b c d
u1 a0 b1 c1 Y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 Y
In order to discern equivalence
classes of the decision attribute d,
to preserve conditions described
by the discernibility matrix for
this table
u1 u2 u3
u2
u3
u4
a,c
b
c a,b
C = {a, b, c}
D = {d}
Reduct = {b, c}
cb
bacbca
)()(
R Chatterjee, PLP, MIU ISI Kolkata
Dependency of Attributes
Set of attribute D depends totally on a set
of attributes C, denoted if all
values of attributes from D are uniquely
determined by values of attributes from C.
,DC
R Chatterjee, PLP, MIU ISI Kolkata
Dependency of Attributes
Let D and C be subsets of A. We will say
that D depends on C in a degree k
denoted by if
where called C-
positive region of D.
),10( k ,DC k
||
|)(|),(
U
DPOSDCk C
),()(/
XCDPOSDUX
C
R Chatterjee, PLP, MIU ISI Kolkata
Dependency of Attributes
Obviously
If k = 1 we say that D depends totally on
C.
If k < 1 we say that D depends partially
(in a degree k) on C.
.||
|)(|),(
/
DUX U
XCDCk
R Chatterjee, PLP, MIU ISI Kolkata
Variable Precision Rough Set
A generalized model of rough sets called
variable precision model (VPRS) aimed at
modeling classification problems involving
uncertain or imprecise information, is
presented by Wojceich Ziarko in 1993.
This extended rough set model able to
allow some degree of misclassification in
the largely correct classification.
R Chatterjee, PLP, MIU ISI Kolkata
Variable Precision Rough Set
c(X,Y) of the relative degree of misclassification of theset X with respect to set Y defined as
where card denotes set cardinality.
The quantity c(X,Y) will be referred to as the relativeclassification error.
The actual number of misclassified elements is givenby the product c(X,Y)*card(X) which is referred to asan absolute classification error.
R Chatterjee, PLP, MIU ISI Kolkata
-majority (VPRS)
is known as admissible classificationerror must be within the range .
More than *100 elements of X should becommon with Y. then it is called –majorityrelations.
Let X1={x1,x2,x3,x4}
X2={x1,x2,x5}
Y={x1,x2,x3,x8}
XY
XY 1
25.0 XY 2
33.0
R Chatterjee, PLP, MIU ISI Kolkata
Set Approximations in VPRS
Let A=(U,R) which consists of a non-
empty, finite universe U and of the
equivalence relation R on U. The
equivalence relation R, referred to as an
indiscernibility relation, corresponds to a
partitioning of the universe U into a
collection of equivalence classes or
elementary sets R*={E1,E2,…,En}
R Chatterjee, PLP, MIU ISI Kolkata
Lower & Upper Approximation (VPRS)
Lower approximation:
or, equivalently,
Upper approximation:
EXR
R
R
R Chatterjee, PLP, MIU ISI Kolkata
Boundary & Negative Region (VPRS)
Boundary region:
Negative region:
R Chatterjee, PLP, MIU ISI Kolkata
Theoretical aspect of Approximation
(VPRS)
The lower approximation of the set X canbe interpreted as the collection of allthose elements of U which can beclassified into X with the classificationerror not greater than b.
The negative region of the set X is thecollection of all those elements of Uwhich can be classified into thecomplement of X, -X with theclassification error not greater than b.
R Chatterjee, PLP, MIU ISI Kolkata
Theoretical aspect of Approximation
(VPRS)
The boundary region of the set X cannot
be classified either into X or –X with the
classification error not greater than b.
The upper approximation of the set X
includes all those elements of U which
cannot be classified into -X with the error
not greater than b.
R Chatterjee, PLP, MIU ISI Kolkata
Fuzzy-Rough Sets (FRS)
One particular use of RST is that of attribute reduction
in datasets. Given dataset with discretized attribute
values, it is possible to find a subset of the original
attributes that are the most informative (termed as
Reduct).
However, most often the case that the values of
attributes may be real-valued and cannot be handled by
traditional rough set.
Some discretization is possible which in turn gives you
loss of information.
To deal with vagueness and noisy data in the dataset,
Fuzzy Rough Set was introduced by Richard Jensen.
R Chatterjee, PLP, MIU ISI Kolkata
Fuzzification for Conditional Features
a b c q
1 -0.4 -0.3 -0.5 No
2 -0.4 0.2 -0.1 Yes
3 -0.3 -0.4 -0.3 No
4 0.3 -0.3 0 Yes
5 0.2 -0.3 0 Yes
6 0.2 0 0 no
Fuzzy-rough set is defined by two
fuzzy sets: fuzzy lower and upper
approximations, obtained by
extending the corresponding
crisp rough set notions.
In crisp case, elements that
belong to the lower
approximation (i.e., have
membership of 1) are said to
belong to the approximated set
with absolute certainty. In fuzzy-
rough case, elements may have a
membership in the range [0,1],
allowing greater flexibility in
handling uncertainty.
R Chatterjee, PLP, MIU ISI Kolkata
Membership values from MFs of
linguistic labels
a b c q
Na Za Nb Zb Nc Zc {1,3,6} {2,4,5}
1 0.8 0.2 0.6 0.4 1.0 0.0 1.0 0.0
2 0.8 0.2 0.0 0.6 0.2 0.8 0.0 1.0
3 0.6 0.4 0.8 0.2 0.6 0.4 1.0 0.0
4 0.0 0.4 0.6 0.4 0.0 1.0 0.0 1.0
5 0.0 0.6 0.6 0.4 0.0 1.0 0.0 1.0
6 0.0 0.6 0.0 1.0 0.0 1.0 1.0 0.0
R Chatterjee, PLP, MIU ISI Kolkata
Lower & Upper Approximations (FRS)
Lower approximation:
Upper approximation:
sup/ PUF
infUy
supUy
sup/ PUF
P
P
R Chatterjee, PLP, MIU ISI Kolkata
Positive Region & Dependency Measure (FRS)
The membership of an object , belonging to the fuzzy
positive region can be defined by
Using the definition of the fuzzy positive region, the new
dependency function can be defined as follows:
P
sup/ QUX
R Chatterjee, PLP, MIU ISI Kolkata
Observations
Evaluation of importance of particular attributes andelimination of redundant attributes from the decisiontable.
Construction of a minimal subset of independentattributes ensuring the same quality of classification asthe whole set, i.e. reducts of the set of attributes.
Intersection of these reducts giving a core of attributes,which cannot be eliminated without disturbing theability of approximating the classification and
Generation of logical rules from the reduced decisiontable.
R Chatterjee, PLP, MIU ISI Kolkata