weka data mining toolsmiertsc/4397cis/weka_data_mining_tool.pdfweka – data mining software...
TRANSCRIPT
WEKAWEKAA Data Mining Tool
By Susan L. Miertschin
1
Data MiningData Mining
Task Types Numerous AlgorithmsTask Types Numerous Algorithms
Classification
Clustering C4.5 Decision Tree
K Means Clustering Clustering
Discovering Association Rules
K-Means Clustering
Discovering Sequential Patterns – Sequence Analysis
R i Regression
Detecting Deviations from Normal
2
http://www cs waikato ac nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/
WEKA can be freely downloaded by visiting the Web sitey y g
3
WEKA Data Mining SoftwareWEKA – Data Mining Software Developed by the Machine Learning Group, University of
Waikato , New Zealand
Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to realmachine learning (ML) techniques and apply them to real-world data-mining problems
Developed in Javap J
4
WELA’s Collection of Machine Learning Alg ithAlgorithms Algorithms for data mining tasks
WEKA is open source software issued under the GNU General Public License
T l f Tools for: Data pre-processing ClassificationClassification Regression Clustering Association rules Visualization
5
After Installing Start WEKAAfter Installing - Start WEKA
6
WEKA Main InterfaceWEKA Main Interface
7
WEKA Sample FilesWEKA Sample Files C:\Program Files\weka\data
WEKA formatted files (.arff)
Open the contact-lenses file
8
Example Contact Lens DataExample – Contact Lens DataHow many ydata instances are in the are in the file?How many attributes?Numerical attributes?attributes?Categorical attributes?
9
Example Contact Lens DataExample – Contact Lens Data
Can you Can you think of problems that might be solved
ith thi with this data?
10
Example Contact Lens DataExample – Contact Lens Data
If supervised learning were to be were to be done, which would be the output attribute attribute, do you think?
11
Example Contact Lens DataExample – Contact Lens Data
12
Example Contact Lens DataExample – Contact Lens Data
13
Example Contact Lens DataExample – Contact Lens Data
14
E l Cl if C t t L D tExample – Classify - Contact Lens Data
15
E l Cl if C t t L D tExample – Classify - Contact Lens Data
16
E l Cl if C t t L D tExample – Classify - Contact Lens Data
Select the Select the rule generator named PART from th li t th t the list that shows up after you after you select Choose
17
E l Cl if C t t L D tExample – Classify - Contact Lens Data
18
E l Cl if C t t L D tExample – Classify - Contact Lens Data
19
10 Fold Cross Validation10-Fold Cross-Validation Data is partitioned into 10 equally (or nearly equally) sized
segments or folds
10 iterations of training and validation are completed
I h d ff f ld f h d h ld f In each iteration a different fold of the data is held out for validation, with the remaining 9 folds used for learning
20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf
E l Cl if C t t L D tExample – Classify - Contact Lens Data
21
E l Cl if C t t L D tExample – Classify - Contact Lens Data
IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none
IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft
22
E l Cl if C t t L D tExample – Classify - Contact Lens Data
Coverage = Coverage 12
23
E l Cl if C t t L D tExample – Classify - Contact Lens Data
IF tear-prod-IF tear prodrate = reduced THEN contact-l lenses = none
IF tig ti IF astigmatism = no THEN contact-lenses co tact e ses= soft
24
E l Cl if C t t L D tExample – Classify - Contact Lens Data Coverage = 6
Misclassification = 1
Accuracy = 5/6 = 83.3%
25
E l Cl if C t t L D tExample – Classify - Contact Lens Data
26
WEKAWEKAA Data Mining Tool
By Susan L. Miertschin
27