data mining with weka weka

165
Data Mining with WEKA

Upload: tommy96

Post on 03-Nov-2014

132 views

Category:

Documents


27 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data Mining with WEKA WEKA

Data Mining with WEKA

Page 2: Data Mining with WEKA WEKA

WEKA Machine learning/data mining software written

in Java Used for research, education, and applications Complements “Data Mining” by Witten & Frank

Main features Comprehensive set of data pre-processing tools, learning

algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

Page 3: Data Mining with WEKA WEKA

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

Data Files

Page 4: Data Mining with WEKA WEKA
Page 5: Data Mining with WEKA WEKA

Explorer: pre-processing Source

Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

Data can also be read from a URL or from an SQL database (using JDBC)

Pre-processing tools Called “filters” Discretization, normalization, resampling, attribute selection,

transforming and combining attributes, …

Page 6: Data Mining with WEKA WEKA
Page 7: Data Mining with WEKA WEKA
Page 8: Data Mining with WEKA WEKA
Page 9: Data Mining with WEKA WEKA
Page 10: Data Mining with WEKA WEKA
Page 11: Data Mining with WEKA WEKA
Page 12: Data Mining with WEKA WEKA
Page 13: Data Mining with WEKA WEKA
Page 14: Data Mining with WEKA WEKA
Page 15: Data Mining with WEKA WEKA
Page 16: Data Mining with WEKA WEKA
Page 17: Data Mining with WEKA WEKA
Page 18: Data Mining with WEKA WEKA
Page 19: Data Mining with WEKA WEKA
Page 20: Data Mining with WEKA WEKA
Page 21: Data Mining with WEKA WEKA
Page 22: Data Mining with WEKA WEKA
Page 23: Data Mining with WEKA WEKA
Page 24: Data Mining with WEKA WEKA
Page 25: Data Mining with WEKA WEKA
Page 26: Data Mining with WEKA WEKA
Page 27: Data Mining with WEKA WEKA

Explorer: building “classifiers” Classifiers in WEKA are models for predicting

nominal or numeric quantities Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

“Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes,

locally weighted learning, …

Page 28: Data Mining with WEKA WEKA
Page 29: Data Mining with WEKA WEKA
Page 30: Data Mining with WEKA WEKA
Page 31: Data Mining with WEKA WEKA
Page 32: Data Mining with WEKA WEKA
Page 33: Data Mining with WEKA WEKA
Page 34: Data Mining with WEKA WEKA
Page 35: Data Mining with WEKA WEKA
Page 36: Data Mining with WEKA WEKA
Page 37: Data Mining with WEKA WEKA
Page 38: Data Mining with WEKA WEKA
Page 39: Data Mining with WEKA WEKA
Page 40: Data Mining with WEKA WEKA
Page 41: Data Mining with WEKA WEKA
Page 42: Data Mining with WEKA WEKA
Page 43: Data Mining with WEKA WEKA
Page 44: Data Mining with WEKA WEKA
Page 45: Data Mining with WEKA WEKA
Page 46: Data Mining with WEKA WEKA
Page 47: Data Mining with WEKA WEKA
Page 48: Data Mining with WEKA WEKA
Page 49: Data Mining with WEKA WEKA
Page 50: Data Mining with WEKA WEKA
Page 51: Data Mining with WEKA WEKA
Page 52: Data Mining with WEKA WEKA
Page 53: Data Mining with WEKA WEKA
Page 54: Data Mining with WEKA WEKA
Page 55: Data Mining with WEKA WEKA
Page 56: Data Mining with WEKA WEKA
Page 57: Data Mining with WEKA WEKA
Page 58: Data Mining with WEKA WEKA
Page 59: Data Mining with WEKA WEKA
Page 60: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 61: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 62: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 63: Data Mining with WEKA WEKA
Page 64: Data Mining with WEKA WEKA
Page 65: Data Mining with WEKA WEKA
Page 66: Data Mining with WEKA WEKA
Page 67: Data Mining with WEKA WEKA
Page 68: Data Mining with WEKA WEKA
Page 69: Data Mining with WEKA WEKA
Page 70: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 71: Data Mining with WEKA WEKA
Page 72: Data Mining with WEKA WEKA
Page 73: Data Mining with WEKA WEKA
Page 74: Data Mining with WEKA WEKA
Page 75: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 76: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 77: Data Mining with WEKA WEKA
Page 78: Data Mining with WEKA WEKA

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 79: Data Mining with WEKA WEKA
Page 80: Data Mining with WEKA WEKA
Page 81: Data Mining with WEKA WEKA
Page 82: Data Mining with WEKA WEKA
Page 83: Data Mining with WEKA WEKA
Page 84: Data Mining with WEKA WEKA
Page 85: Data Mining with WEKA WEKA
Page 86: Data Mining with WEKA WEKA
Page 87: Data Mining with WEKA WEKA

Explorer: clustering data WEKA contains “clusterers” for finding groups

of similar instances in a dataset Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst

Clusters can be visualized and compared to “true” clusters (if given)

Evaluation based on loglikelihood if clustering scheme produces a probability distribution

Page 88: Data Mining with WEKA WEKA
Page 89: Data Mining with WEKA WEKA
Page 90: Data Mining with WEKA WEKA
Page 91: Data Mining with WEKA WEKA
Page 92: Data Mining with WEKA WEKA
Page 93: Data Mining with WEKA WEKA
Page 94: Data Mining with WEKA WEKA
Page 95: Data Mining with WEKA WEKA
Page 96: Data Mining with WEKA WEKA
Page 97: Data Mining with WEKA WEKA
Page 98: Data Mining with WEKA WEKA
Page 99: Data Mining with WEKA WEKA
Page 100: Data Mining with WEKA WEKA
Page 101: Data Mining with WEKA WEKA
Page 102: Data Mining with WEKA WEKA
Page 103: Data Mining with WEKA WEKA

Explorer: finding associations WEKA contains an implementation of the

Apriori algorithm for learning association rules Works only with discrete data

Can identify statistical dependencies between groups of attributes:

milk, butter bread, eggs (with confidence 0.9 and support 2000)

Apriori can compute all rules that have a given minimum support and exceed a given confidence

Page 104: Data Mining with WEKA WEKA
Page 105: Data Mining with WEKA WEKA
Page 106: Data Mining with WEKA WEKA
Page 107: Data Mining with WEKA WEKA
Page 108: Data Mining with WEKA WEKA
Page 109: Data Mining with WEKA WEKA
Page 110: Data Mining with WEKA WEKA
Page 111: Data Mining with WEKA WEKA

Explorer: attribute selection Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones

Attribute selection methods contain two parts: A search method: best-first, forward selection, random,

exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

Very flexible: WEKA allows (almost) arbitrary combinations of these two

Page 112: Data Mining with WEKA WEKA
Page 113: Data Mining with WEKA WEKA
Page 114: Data Mining with WEKA WEKA
Page 115: Data Mining with WEKA WEKA
Page 116: Data Mining with WEKA WEKA
Page 117: Data Mining with WEKA WEKA
Page 118: Data Mining with WEKA WEKA
Page 119: Data Mining with WEKA WEKA
Page 120: Data Mining with WEKA WEKA

Explorer: data visualization Visualization very useful in practice: e.g. helps

to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)

Color-coded class values “Jitter” option to deal with nominal attributes

(and to detect “hidden” data points) “Zoom-in” function

Page 121: Data Mining with WEKA WEKA
Page 122: Data Mining with WEKA WEKA
Page 123: Data Mining with WEKA WEKA
Page 124: Data Mining with WEKA WEKA
Page 125: Data Mining with WEKA WEKA
Page 126: Data Mining with WEKA WEKA
Page 127: Data Mining with WEKA WEKA
Page 128: Data Mining with WEKA WEKA
Page 129: Data Mining with WEKA WEKA
Page 130: Data Mining with WEKA WEKA
Page 131: Data Mining with WEKA WEKA
Page 132: Data Mining with WEKA WEKA

Performing experiments Experimenter makes it easy to compare the

performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning

curve, hold-out Can also iterate over different parameter

settings Significance-testing built in!

Page 133: Data Mining with WEKA WEKA
Page 134: Data Mining with WEKA WEKA
Page 135: Data Mining with WEKA WEKA
Page 136: Data Mining with WEKA WEKA
Page 137: Data Mining with WEKA WEKA
Page 138: Data Mining with WEKA WEKA
Page 139: Data Mining with WEKA WEKA
Page 140: Data Mining with WEKA WEKA
Page 141: Data Mining with WEKA WEKA
Page 142: Data Mining with WEKA WEKA
Page 143: Data Mining with WEKA WEKA
Page 144: Data Mining with WEKA WEKA
Page 145: Data Mining with WEKA WEKA

The Knowledge Flow GUI

New graphical user interface for WEKA Java-Beans-based interface for setting up and

running machine learning experiments Data sources, classifiers, etc. are beans and

can be connected graphically Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”

Layouts can be saved and loaded again later

Page 146: Data Mining with WEKA WEKA
Page 147: Data Mining with WEKA WEKA
Page 148: Data Mining with WEKA WEKA
Page 149: Data Mining with WEKA WEKA
Page 150: Data Mining with WEKA WEKA
Page 151: Data Mining with WEKA WEKA
Page 152: Data Mining with WEKA WEKA
Page 153: Data Mining with WEKA WEKA
Page 154: Data Mining with WEKA WEKA
Page 155: Data Mining with WEKA WEKA
Page 156: Data Mining with WEKA WEKA
Page 157: Data Mining with WEKA WEKA
Page 158: Data Mining with WEKA WEKA
Page 159: Data Mining with WEKA WEKA
Page 160: Data Mining with WEKA WEKA
Page 161: Data Mining with WEKA WEKA
Page 162: Data Mining with WEKA WEKA
Page 163: Data Mining with WEKA WEKA
Page 164: Data Mining with WEKA WEKA
Page 165: Data Mining with WEKA WEKA

Conclusion: try it yourself! WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony

Voyle, Xin Xu, Yong Wang, Zhihai Wang