www.monash.edu.au rough sets in data mining cse5610 intelligent software systems semester 1, 2006

www.monash.edu.au

Rough Sets in Data Mining

CSE5610 Intelligent Software Systems

Semester 1, 2006

www.monash.edu.au

2

Lecture Outline

• Rough Sets• Major Concepts• Running Example

• Rough Sets : Identifying Significant Attributes in Data• Performing pre-processing

• Concluding Remarks• Beyond Pre-processing to Data Mining

• References/Resources

www.monash.edu.au

3

Rough Sets

• Zdislaw Pawlak, 1982• Extension of traditional set theory

• Classification and analysis of data tables• Handling uncertainty in data

• Missing data• Noisy data• Ambiquity in semantics

• Produce an inexact or rough classification of data

www.monash.edu.au

4

NegativeRegion

Upper Approximation

Lower Approximation

Boundary Region

Rough Sets Membership

www.monash.edu.au

5

Information System

• Information System (S) = {U, A, V, f } • U - non-empty, finite set of objects called Universe

U = {x1, x2 , ….., xn}• • A - finite, non-empty set of attributes.

A = C D and C D = . Condition attributes (C) and Decision attributes (D).

• • V - set of domains of all attributes (A) of S( i.e. Va is the domain of the attribute a ).

• f : U A , is a function such that f(x, a) Va, for a A and x U.

www.monash.edu.au

6

Example: Information Systems

U a b c d e

1 1 0 2 2 0

2 0 1 1 1 2

3 2 0 0 1 1

4 1 1 0 2 2

5 1 0 2 0 1

6 2 2 0 1 1

7 2 1 1 1 2

8 0 1 1 0 1

www.monash.edu.au

7

Equivalence Classes

• xi, xj U are indiscernible if for a given set of attributes B

(i.e. B A ), xi, xj have the same values. a(xi ) = a(xj) for all a B.

• • Indiscernible objects are elements of an equivalence class

[x]B

• The set U/IND(B) is the set of all equivalence classes in the relation B

• The equivalence relation U/IND(B) is mathematically defined as :

U/ IND(B) = { ( xi, xj ) U : for every a B, a(xi) = a(xj) }

www.monash.edu.au

8

Example: Information Systems

U a b c d e

1 1 0 2 2 0

2 0 1 1 1 2

3 2 0 0 1 1

4 1 1 0 2 2

5 1 0 2 0 1

6 2 2 0 1 1

7 2 1 1 1 2

8 0 1 1 0 1Let B = {a, b, c}. U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6},

{7}}

www.monash.edu.au

9

Approximation Space

• Central concept for dealing with uncertainty & vagueness

• Specifies boundaries for classifying objects

• Lower approximation - objects that can be classified with certainty as elements of X (where X U), according to the attribute set B (B A)

• Upper approximation - objects that can be classified as possibly being elements of X - can neither be accepted nor rejected with certainty

www.monash.edu.au

10

•S = {U, A, V, f}, let X U be a set of objects and B A be a set of attributes.

•Then the lower approximation of X with respect to B is: BX = {x U | [x]B X}

•The upper approximation of X with respect to B is:

X = {x U | [x]B X }

•Boundary region of X is BNB (X) = BX – BX.

•Strong member if it is part of the lower approximation

•Weak member if it is part of the boundary region.

Approximation Space

B

www.monash.edu.au

11

Example: Approximation Space

• Let X = {1, 2, 3, 4, 5} and B = {a, b, c}• U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}

• Object 1 belongs to the equivalence class {1,5} • This class is a subset of X. • Therefore object 1 is considered as belonging to the lower approximation.

• Object 2 belongs to the equivalence class {2,8}, • This class is not a subset of X (since 8 does not belong to X).• Hence, object 2 is not classified as belonging to the lower approximation. • However, object 2 belongs to the upper approximation since the {2,8} X is not

empty.

• The lower and upper approximation for the example is:Lower Approximation = {1, 5, 3, 4}Upper Approximation = {1, 2, 3, 4, 5, 8}

www.monash.edu.au

12

Dispensability

• For an Information System S={U, V, A, f} an attribute a is said to be dispensable or superfluous if, in a given subset of attributes B A , IND(B) = IND( B – {a} )

(Note: a B, IND is the indiscernibility relation).

www.monash.edu.au

13

Reduct

• A reduct of B is a set of attributes B B, such that all attributes a B - B are dispensable and IND(B) = IND(B).

• A reduct:- contains only non-superfluous attribute- maintains the indiscernibility relation between the original

attribute subset and itself (i.e. the reduct).

• There can be several reducts for a given subset of attributes B.

• It is relatively simple to compute a single reduct

• The general solution for finding all reducts is NP-complex.

www.monash.edu.au

14

Core

• The set of elements that are common to all the reducts.

• Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix.

• D-core & D-reducts : Core and Reducts relative to the decision attributes

www.monash.edu.au

15

Core

• The set of elements that are common to all the reducts.

• Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix.

• D-core & D-reducts : Core and Reducts relative to the decision attributes

www.monash.edu.au

16

Positive Region

• A positive region for an equivalence class is defined with respect to another equivalence class

• Let C and D be two equivalence classes over a universe U.

• The C-positive region of D, denoted by POSC (D), is: The set of all the objects of the universe U that can be classified as the lower approximation of D on the basis of the knowledge regarding the lower approximation of C

• This is expressed as follows:POSC(D) = {CX : X U/IND(D)}

www.monash.edu.au

17

Example: Positive Region

Let D = {a, b, c} and C = {d, e}.

U/IND(C) = {{1}, {2, 7}, {3, 6}, {4}, {5, 8}}

Let us name the equivalence classes in U/IND(C) as X1, X2, X3, X4, X5 as follows:

X1 = {1}, X2 = {2, 7}, X3 = {3, 6}, X4 = {4}, X5= {5, 8}

U/IND(D) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}

Let us name the equivalence classes in U/IND(D) as Y1, Y2, Y3, Y4, Y5 , Y6 as follows:

Y1 = {1, 5}, Y2 = {2, 8}, Y3 = {3}, Y4 = {4}, Y5= {6}, Y5= {7}

www.monash.edu.au

18

Example: Positive Region

Let us now compute the POSC(D) as follows.

Determine the objects in C that can be classified as being in the lower approximation with respect to D.

C X1 = { }C X2 = {7} C X3 = {3, 6} C X4 = {4}C X5 = { }

The positive region computed as the union of the lower approximations.POSC(D) = C X1 C X2 C X3 C X4 C X5

= {3, 4, 6, 7}.

www.monash.edu.au

19

Degree of Dependency

• Degree of Dependency (k) between two sets of attributes, C and D (where C, D U) is measured using the concept of positive region as follows:

k(C, D) = card (POSC(D) ) / card (U)

• The value of k(C, D) takes values 0 k 1

• The higher the value of k, the greater is the dependency between the two sets of attributes.

www.monash.edu.au

20

Example: Degree of Dependency

• We can compute the degree of dependency between the attributes C = {a, b, c} and D = {d, e} as follows:

• We know the positive region POSC(D) = {3, 4, 6, 7}

• k(C, D)

= |{3, 4, 6, 7}| / |{1, 2, 3, 4, 5, 6, 7, 8}|

= 4 /8 = 0.5

www.monash.edu.au

21

Significance of Attributes

• Significance of an attribute a :

SGF(a) = K(C a), D) – K(C,D)

• Measures extent by which an attribute alters the degree of dependency between C and D

• If an attribute is “important” in discerning/determining the decision attribute, then its value will be closer to 1.

www.monash.edu.au

Back to CSE3212 -Preprocessing

CSE5610 Intelligent Software Systems

Semester 1, 2006

www.monash.edu.au

23

Pre-proccesing • A Refresher…• Data Reduction

• Why ?• How

• Aggregation• Dimensionality Reduction• Numerosity Reduction• Discretisation

• Dimensionality Reduction• Feature/Attribute Selection• Different Techniques including Rough Sets

www.monash.edu.au

24

Dimensionality Reduction

• Feature selection (i.e., attribute subset selection):– Select a minimum set of attributes such that the probability

distribution of different classes given the values for those attributes is as close as possible to the original distribution given the values of all features

– Reduction in size and easier to understand.

• A number of heuristic methods (due to exponential # of choices):

– step-wise forward selection– step-wise backward elimination– combining forward selection and backward elimination– decision-tree induction

www.monash.edu.au

25

Lets Try & Work This

• Step-wise forward selection

• Step-wise backward selection

www.monash.edu.au

26

Rough Sets: Bigger Picture

• Used for Data Mining

• Several Algorithms for Learning

• Mostly Classification

• Deals with real world data

• Noisy and Missing Values

• And many more applications …

www.monash.edu.au

27

References

• Sever, H., Raghavan, V, V., and Johnsten, T, D., (1998), “The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Proceedings of the Second International Conference on Nonlinear Problems in Aviation and Aerospace (ICNPAA98), Daytona Beach, Florida, USA, Apr- May, Vol. 2, pp. 673-680.

• Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K.Pal and A.Skowron, Springer Verlag, pp. 3-98.

• Pawlak, Z., (1992), “Rough sets: Theoretical Aspects of Reasoning about Data”, Kluwer Academic Publishers, London, UK.

www.monash.edu.au rough sets in data mining cse5610 intelligent software systems semester 1, 2006

Documents