a discretization algorithm based on class-attribute contigency coefficient professor-dr. kim...

14
A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Upload: luke-mccoy

Post on 13-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

A DISCRETIZATION ALGORITHM BASED ON

CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT

professor-Dr. Kim

Presenter-Sukumar

Page 2: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Discretization

Procedure

Goal

CACC

How to overcome the problem?

Pseudo code

New Method

CONTENT

Page 3: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Discretization • As rapid development of information technology storage devices are widely used to save data.

• People are often unable to extract useful knowledge from the such huge datasets.

• Plays important role in data mining and knowledge discovery.

• Summarization of continuous attributes to understand data more easily and also for accurate and faster.

Page 4: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Proposed discretization algorithms can be divided into :

Top-down/Bottom-up

supervised/unsupervised

(IEM,CAIM,Equal Width,Equal frequency)

Among all CAIM maintain the highest interdependence between target class and discretized attributes ,which attains highest classification accuracy.

Procedure

Page 5: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Supervised methods discretize attributes with the consideration of class information ,but unsupervised are not.

Top-down methods start with an empty list of cut points and add new ones in each step,

where as Bottom-up methods start with complete list of all continous values as cut points and remove some of them by merge intervals in each step.

Page 6: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Goal

Two drawbacks of CAIM are:

CAIM usually,generate a simple discretization scheme in which the number of intervals is very close to the number of target classes.

CAIM considers only the classes with the most samples and ignores all the other target classes.Such cases decreases the quality of the produced discretization scheme.

Page 7: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

A good discretization algorithm should generate a schema which maintains a high interdependency between the target class and the discrete attribute.

To represent the original data distribution here we discretized the 15 samples into 15 intervals,but there will be an over-fitting.

Our discretization formula taking care on avoiding over-fitting and distribution of all samples to get ideal scheme.

CACC

Page 8: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

M is the total number of samples ,n is the number of intervalsQ is no.of samples with class I,r iterates through all intervals, i.e. r = 1, 2 ,..., n Mi+ Is the total number of samples in the interval [ dr-1,dr]M+r = the total number of continuous values of attribute F that are within the interval (dr-1, dr]

Page 9: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

We divide the y by log(n) for two reasons:

a)Speed up the discretization process

b)Discretization scheme containing too many intervals could suffer from an overfitting problem.

However summarized value was divided by the number of intervals n is very close to the number of target classes.Hence we use log(n) the equation instead of n to reduce its influence.

How To Overcome The Problem?

Page 10: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar
Page 11: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar
Page 12: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar
Page 13: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar
Page 14: A DISCRETIZATION ALGORITHM BASED ON CLASS-ATTRIBUTE CONTIGENCY COEFFICIENT professor-Dr. Kim Presenter-Sukumar

Thank You….