www.monash.edu.au rough sets in data mining cse5610 intelligent software systems semester 1, 2006

27
www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

Upload: august-wallett

Post on 31-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

Rough Sets in Data Mining

CSE5610 Intelligent Software Systems

Semester 1, 2006

Page 2: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

2

Lecture Outline

• Rough Sets• Major Concepts• Running Example

• Rough Sets : Identifying Significant Attributes in Data• Performing pre-processing

• Concluding Remarks• Beyond Pre-processing to Data Mining

• References/Resources

Page 3: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

3

Rough Sets

• Zdislaw Pawlak, 1982• Extension of traditional set theory

• Classification and analysis of data tables• Handling uncertainty in data

• Missing data• Noisy data• Ambiquity in semantics

• Produce an inexact or rough classification of data

Page 4: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

4

NegativeRegion

Upper Approximation

Lower Approximation

Boundary Region

Rough Sets Membership

Page 5: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

5

Information System

• Information System (S) = {U, A, V, f } • U - non-empty, finite set of objects called Universe

U = {x1, x2 , ….., xn}• • A - finite, non-empty set of attributes.

A = C D and C D = . Condition attributes (C) and Decision attributes (D).

• • V - set of domains of all attributes (A) of S( i.e. Va is the domain of the attribute a ).

• f : U A , is a function such that f(x, a) Va, for a A and x U.

Page 6: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

6

Example: Information Systems

U a b c d e

1 1 0 2 2 0

2 0 1 1 1 2

3 2 0 0 1 1

4 1 1 0 2 2

5 1 0 2 0 1

6 2 2 0 1 1

7 2 1 1 1 2

8 0 1 1 0 1

Page 7: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

7

Equivalence Classes

• xi, xj U are indiscernible if for a given set of attributes B

(i.e. B A ), xi, xj have the same values. a(xi ) = a(xj) for all a B.

• • Indiscernible objects are elements of an equivalence class

[x]B

• The set U/IND(B) is the set of all equivalence classes in the relation B

• The equivalence relation U/IND(B) is mathematically defined as :

U/ IND(B) = { ( xi, xj ) U : for every a B, a(xi) = a(xj) }

Page 8: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

8

Example: Information Systems

U a b c d e

1 1 0 2 2 0

2 0 1 1 1 2

3 2 0 0 1 1

4 1 1 0 2 2

5 1 0 2 0 1

6 2 2 0 1 1

7 2 1 1 1 2

8 0 1 1 0 1Let B = {a, b, c}. U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6},

{7}}

Page 9: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

9

Approximation Space

• Central concept for dealing with uncertainty & vagueness

• Specifies boundaries for classifying objects

• Lower approximation - objects that can be classified with certainty as elements of X (where X U), according to the attribute set B (B A)

• Upper approximation - objects that can be classified as possibly being elements of X - can neither be accepted nor rejected with certainty

Page 10: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

10

•S = {U, A, V, f}, let X U be a set of objects and B A be a set of attributes.

•Then the lower approximation of X with respect to B is: BX = {x U | [x]B X}

•The upper approximation of X with respect to B is:

X = {x U | [x]B X }

•Boundary region of X is BNB (X) = BX – BX.

•Strong member if it is part of the lower approximation

•Weak member if it is part of the boundary region.

Approximation Space

B

Page 11: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

11

Example: Approximation Space

• Let X = {1, 2, 3, 4, 5} and B = {a, b, c}• U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}

• Object 1 belongs to the equivalence class {1,5} • This class is a subset of X. • Therefore object 1 is considered as belonging to the lower approximation.

• Object 2 belongs to the equivalence class {2,8}, • This class is not a subset of X (since 8 does not belong to X).• Hence, object 2 is not classified as belonging to the lower approximation. • However, object 2 belongs to the upper approximation since the {2,8} X is not

empty.

• The lower and upper approximation for the example is:Lower Approximation = {1, 5, 3, 4}Upper Approximation = {1, 2, 3, 4, 5, 8}

Page 12: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

12

Dispensability

• For an Information System S={U, V, A, f} an attribute a is said to be dispensable or superfluous if, in a given subset of attributes B A , IND(B) = IND( B – {a} )

(Note: a B, IND is the indiscernibility relation).

Page 13: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

13

Reduct

• A reduct of B is a set of attributes B B, such that all attributes a B - B are dispensable and IND(B) = IND(B).

• A reduct:- contains only non-superfluous attribute- maintains the indiscernibility relation between the original

attribute subset and itself (i.e. the reduct).

• There can be several reducts for a given subset of attributes B.

• It is relatively simple to compute a single reduct

• The general solution for finding all reducts is NP-complex.

Page 14: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

14

Core

• The set of elements that are common to all the reducts.

• Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix.

• D-core & D-reducts : Core and Reducts relative to the decision attributes

Page 15: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

15

Core

• The set of elements that are common to all the reducts.

• Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix.

• D-core & D-reducts : Core and Reducts relative to the decision attributes

Page 16: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

16

Positive Region

• A positive region for an equivalence class is defined with respect to another equivalence class

• Let C and D be two equivalence classes over a universe U.

• The C-positive region of D, denoted by POSC (D), is: The set of all the objects of the universe U that can be classified as the lower approximation of D on the basis of the knowledge regarding the lower approximation of C

• This is expressed as follows:POSC(D) = {CX : X U/IND(D)}

Page 17: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

17

Example: Positive Region

Let D = {a, b, c} and C = {d, e}.

U/IND(C) = {{1}, {2, 7}, {3, 6}, {4}, {5, 8}}

Let us name the equivalence classes in U/IND(C) as X1, X2, X3, X4, X5 as follows:

X1 = {1}, X2 = {2, 7}, X3 = {3, 6}, X4 = {4}, X5= {5, 8}

U/IND(D) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}

Let us name the equivalence classes in U/IND(D) as Y1, Y2, Y3, Y4, Y5 , Y6 as follows:

Y1 = {1, 5}, Y2 = {2, 8}, Y3 = {3}, Y4 = {4}, Y5= {6}, Y5= {7}

Page 18: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

18

Example: Positive Region

Let us now compute the POSC(D) as follows.

Determine the objects in C that can be classified as being in the lower approximation with respect to D.

C X1 = { }C X2 = {7} C X3 = {3, 6} C X4 = {4}C X5 = { }

The positive region computed as the union of the lower approximations.POSC(D) = C X1 C X2 C X3 C X4 C X5

= {3, 4, 6, 7}.

Page 19: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

19

Degree of Dependency

• Degree of Dependency (k) between two sets of attributes, C and D (where C, D U) is measured using the concept of positive region as follows:

k(C, D) = card (POSC(D) ) / card (U)

• The value of k(C, D) takes values 0 k 1

• The higher the value of k, the greater is the dependency between the two sets of attributes.

Page 20: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

20

Example: Degree of Dependency

• We can compute the degree of dependency between the attributes C = {a, b, c} and D = {d, e} as follows:

• We know the positive region POSC(D) = {3, 4, 6, 7}

• k(C, D)

= |{3, 4, 6, 7}| / |{1, 2, 3, 4, 5, 6, 7, 8}|

= 4 /8 = 0.5

Page 21: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

21

Significance of Attributes

• Significance of an attribute a :

SGF(a) = K(C a), D) – K(C,D)

• Measures extent by which an attribute alters the degree of dependency between C and D

• If an attribute is “important” in discerning/determining the decision attribute, then its value will be closer to 1.

Page 22: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

Back to CSE3212 -Preprocessing

CSE5610 Intelligent Software Systems

Semester 1, 2006

Page 23: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

23

Pre-proccesing • A Refresher…• Data Reduction

• Why ?• How

• Aggregation• Dimensionality Reduction• Numerosity Reduction• Discretisation

• Dimensionality Reduction• Feature/Attribute Selection• Different Techniques including Rough Sets

Page 24: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

24

Dimensionality Reduction

• Feature selection (i.e., attribute subset selection):– Select a minimum set of attributes such that the probability

distribution of different classes given the values for those attributes is as close as possible to the original distribution given the values of all features

– Reduction in size and easier to understand.

• A number of heuristic methods (due to exponential # of choices):

– step-wise forward selection– step-wise backward elimination– combining forward selection and backward elimination– decision-tree induction

Page 25: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

25

Lets Try & Work This

• Step-wise forward selection

• Step-wise backward selection

Page 26: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

26

Rough Sets: Bigger Picture

• Used for Data Mining

• Several Algorithms for Learning

• Mostly Classification

• Deals with real world data

• Noisy and Missing Values

• And many more applications …

Page 27: Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

www.monash.edu.au

27

References

• Sever, H., Raghavan, V, V., and Johnsten, T, D., (1998), “The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Proceedings of the Second International Conference on Nonlinear Problems in Aviation and Aerospace (ICNPAA98), Daytona Beach, Florida, USA, Apr- May, Vol. 2, pp. 673-680.

• Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K.Pal and A.Skowron, Springer Verlag, pp. 3-98.

• Pawlak, Z., (1992), “Rough sets: Theoretical Aspects of Reasoning about Data”, Kluwer Academic Publishers, London, UK.