data mining: implementation of data mining techniques using rapidminer software

16
Data Mining: Implementation of Data Mining Techniques using RapidMiner software Prepared by Mohammed Kharma

Upload: mohammed-kharma

Post on 14-Jun-2015

290 views

Category:

Data & Analytics


4 download

DESCRIPTION

Data Mining: Implementation of Data Mining Techniques using RapidMiner software presentation

TRANSCRIPT

Page 1: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Data Mining: Implementation of Data Mining Techniques using

RapidMiner softwarePrepared by

Mohammed Kharma

Page 2: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Definitions review

• Cluster: A collection of data objects– similar (or related) to one another within the

same group– dissimilar (or unrelated) to the objects in other

groups• Cluster analysis– Finding similarities between data according to the

characteristics found in the data and grouping similar data objects into clusters

Page 3: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Clustering Methods

• Partitioning : – Unsupervised learning algorithms, Construct various

partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors

– Typical methods: k-means, k-medoids• Hierarchical : – Create a hierarchical decomposition of the set of

data (or objects) using some criterion– Typical methods: Diana, Agnes, BIRCH, ROCK,

CAMELEON

Page 4: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Illustration & compression of 2 clustering technique using Rapidminer tool and Java

application

Page 5: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

illustrate of 2 clustering technique using Rapidminer tool and Java

• K-means algorithm: We performed two test

1. Using java program: program parameters K = 2;Data: 22 2123 2024 2225 33 2

Page 6: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

6

K-means Clustering• Input: the number of clusters K and the collection of n

instances• Output: a set of k clusters that minimizes the squared error

criterion• Method:– Arbitrarily choose k instances as the initial cluster centers– Repeat• (Re)assign each instance to the cluster to which the

instance is the most similar, based on the mean value of the instances in the cluster• Update cluster means (compute mean value of the

instances for each cluster)– Until no change in the assignment

• Squared Error Criterion– E = ∑i=1 k ∑ pЄCi |p-mi|2 – where mi are the cluster means and p are points in clusters

Page 7: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

The result K-Means-java program

Page 8: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

The result of K-Means-RapidMiner

Page 9: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

The result of K-Means-RapidMiner

Page 10: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Continued-The result of K-Means-RapidMiner

Page 11: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

11

K-medoids• Input: the number of clusters K and the collection of n

instances• Output: A set of k clusters that minimizes the sum of the

dissimilarities of all the instances to their nearest medoids• Method:– Arbitrarily choose k instances as the initial medoids– Repeat• (Re)assign each remaining instance to the cluster with

the nearest medoid• Randomly select a non-medoid instance, or• Compute the total cost, S, of swapping Oj with Or• If S<0 then swap Oj with Or to form the new set of k

medoids– Until no change

Page 12: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

The result of k-medoids-RapidMiner

Page 13: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

The result of k-medoids-RapidMiner

Page 14: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Java Live Demo:http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

Page 15: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Comparison

The results of both algorithms are the sameBoth require K to be specified in the

inputK-medoids is less influenced by outliers in the

dataBoth methods assign each instance exactly to

one cluster

Page 16: Data Mining: Implementation of Data Mining Techniques using RapidMiner software

»Thank you