weka data mining tool - university of houstonsmiertsc/4397cis/weka_data_mining_tool.pdf · weka –...

27
WEKA WEKA A Data Mining Tool By Susan L. Miertschin 1

Upload: votruc

Post on 31-Aug-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WEKAWEKAA Data Mining Tool

By Susan L. Miertschin

1

Page 2: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Data MiningData Mining

Task Types Numerous AlgorithmsTask Types Numerous Algorithms

Classification

Clustering C4.5 Decision Tree

K Means Clustering Clustering

Discovering Association Rules

K-Means Clustering

Discovering Sequential Patterns – Sequence Analysis

R i Regression

Detecting Deviations from Normal

2

Page 3: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

http://www cs waikato ac nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/

WEKA can be freely downloaded by visiting the Web sitey y g

3

Page 4: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WEKA Data Mining SoftwareWEKA – Data Mining Software Developed by the Machine Learning Group, University of

Waikato , New Zealand

Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to realmachine learning (ML) techniques and apply them to real-world data-mining problems

Developed in Javap J

4

Page 5: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WELA’s Collection of Machine Learning Alg ithAlgorithms Algorithms for data mining tasks

WEKA is open source software issued under the GNU General Public License

T l f Tools for: Data pre-processing ClassificationClassification Regression Clustering Association rules Visualization

5

Page 6: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

After Installing Start WEKAAfter Installing - Start WEKA

6

Page 7: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WEKA Main InterfaceWEKA Main Interface

7

Page 8: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WEKA Sample FilesWEKA Sample Files C:\Program Files\weka\data

WEKA formatted files (.arff)

Open the contact-lenses file

8

Page 9: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens DataHow many ydata instances are in the are in the file?How many attributes?Numerical attributes?attributes?Categorical attributes?

9

Page 10: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens Data

Can you Can you think of problems that might be solved

ith thi with this data?

10

Page 11: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens Data

If supervised learning were to be were to be done, which would be the output attribute attribute, do you think?

11

Page 12: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens Data

12

Page 13: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens Data

13

Page 14: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

Example Contact Lens DataExample – Contact Lens Data

14

Page 15: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

15

Page 16: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

16

Page 17: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

Select the Select the rule generator named PART from th li t th t the list that shows up after you after you select Choose

17

Page 18: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

18

Page 19: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

19

Page 20: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

10 Fold Cross Validation10-Fold Cross-Validation Data is partitioned into 10 equally (or nearly equally) sized

segments or folds

10 iterations of training and validation are completed

I h d ff f ld f h d h ld f In each iteration a different fold of the data is held out for validation, with the remaining 9 folds used for learning

20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf

Page 21: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

21

Page 22: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none

IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft

22

Page 23: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

Coverage = Coverage 12

23

Page 24: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none

IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft

24

Page 25: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data Coverage = 6

Misclassification = 1

Accuracy = 5/6 = 83.3%

25

Page 26: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

E l Cl if C t t L D tExample – Classify - Contact Lens Data

26

Page 27: WEKA Data Mining Tool - University of Houstonsmiertsc/4397cis/WEKA_Data_Mining_Tool.pdf · WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato

WEKAWEKAA Data Mining Tool

By Susan L. Miertschin

27