weka data mining tool

27
WEKA WEKA A Data Mining Tool By Susan L. Miertschin 1

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

WEKAWEKAA Data Mining Tool

By Susan L. Miertschin

1

Data MiningData Mining

Task Types Numerous AlgorithmsTask Types Numerous Algorithms

Classification

Clustering C4.5 Decision Tree

K Means Clustering Clustering

Discovering Association Rules

K-Means Clustering

Discovering Sequential Patterns – Sequence Analysis

R i Regression

Detecting Deviations from Normal

2

http://www cs waikato ac nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/

WEKA can be freely downloaded by visiting the Web sitey y g

3

WEKA Data Mining SoftwareWEKA – Data Mining Software Developed by the Machine Learning Group, University of

Waikato , New Zealand

Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to realmachine learning (ML) techniques and apply them to real-world data-mining problems

Developed in Javap J

4

WELA’s Collection of Machine Learning Alg ithAlgorithms Algorithms for data mining tasks

WEKA is open source software issued under the GNU General Public License

T l f Tools for: Data pre-processing ClassificationClassification Regression Clustering Association rules Visualization

5

After Installing Start WEKAAfter Installing - Start WEKA

6

WEKA Main InterfaceWEKA Main Interface

7

WEKA Sample FilesWEKA Sample Files C:\Program Files\weka\data

WEKA formatted files (.arff)

Open the contact-lenses file

8

Example Contact Lens DataExample – Contact Lens DataHow many ydata instances are in the are in the file?How many attributes?Numerical attributes?attributes?Categorical attributes?

9

Example Contact Lens DataExample – Contact Lens Data

Can you Can you think of problems that might be solved

ith thi with this data?

10

Example Contact Lens DataExample – Contact Lens Data

If supervised learning were to be were to be done, which would be the output attribute attribute, do you think?

11

Example Contact Lens DataExample – Contact Lens Data

12

Example Contact Lens DataExample – Contact Lens Data

13

Example Contact Lens DataExample – Contact Lens Data

14

E l Cl if C t t L D tExample – Classify - Contact Lens Data

15

E l Cl if C t t L D tExample – Classify - Contact Lens Data

16

E l Cl if C t t L D tExample – Classify - Contact Lens Data

Select the Select the rule generator named PART from th li t th t the list that shows up after you after you select Choose

17

E l Cl if C t t L D tExample – Classify - Contact Lens Data

18

E l Cl if C t t L D tExample – Classify - Contact Lens Data

19

10 Fold Cross Validation10-Fold Cross-Validation Data is partitioned into 10 equally (or nearly equally) sized

segments or folds

10 iterations of training and validation are completed

I h d ff f ld f h d h ld f In each iteration a different fold of the data is held out for validation, with the remaining 9 folds used for learning

20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf

E l Cl if C t t L D tExample – Classify - Contact Lens Data

21

E l Cl if C t t L D tExample – Classify - Contact Lens Data

IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none

IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft

22

E l Cl if C t t L D tExample – Classify - Contact Lens Data

Coverage = Coverage 12

23

E l Cl if C t t L D tExample – Classify - Contact Lens Data

IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none

IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft

24

E l Cl if C t t L D tExample – Classify - Contact Lens Data Coverage = 6

Misclassification = 1

Accuracy = 5/6 = 83.3%

25

E l Cl if C t t L D tExample – Classify - Contact Lens Data

26

WEKAWEKAA Data Mining Tool

By Susan L. Miertschin

27