weka: a machine machine learning with wekatwiki.di.uniroma1.it/pub/apprauto/annoacc0708/weka.pdf ·...

173
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

Upload: others

Post on 25-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Department of Computer Science,University of Waikato, New Zealand

Eibe Frank

� WEKA: A MachineLearning Toolkit

� The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

� The Experimenter� The Knowledge

Flow GUI� Conclusions

Machine Learning withWEKA

10/19/2007 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

10/19/2007 University of Waikato 3

WEKA: the software� Machine learning/data mining software written in

Java (distributed under the GNU Public License)� Used for research, education, and applications� Complements “Data Mining” by Witten & Frank� Main features:

� Comprehensive set of data pre-processing tools,learning algorithms and evaluation methods

� Graphical user interfaces (incl. data visualization)� Environment for comparing learning algorithms

10/19/2007 University of Waikato 4

WEKA: versions� There are several versions of WEKA:

� WEKA 3.0: “book version” compatible withdescription in data mining book

� WEKA 3.2: “GUI version” adds graphical userinterfaces (book version is command-line only)

� WEKA 3.3: “development version” with lots ofimprovements

� This talk is based on the latest snapshot of WEKA3.3 (soon to be WEKA 3.4)

10/19/2007 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

10/19/2007 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

10/19/2007 University of Waikato 7

10/19/2007 University of Waikato 8

10/19/2007 University of Waikato 9

10/19/2007 University of Waikato 10

Explorer: pre-processing the data� Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary� Data can also be read from a URL or from an SQL

database (using JDBC)� Pre-processing tools in WEKA are called “filters”� WEKA contains filters for:

� Discretization, normalization, resampling, attributeselection, transforming and combining attributes, …

10/19/2007 University of Waikato 11

10/19/2007 University of Waikato 12

10/19/2007 University of Waikato 13

10/19/2007 University of Waikato 14

10/19/2007 University of Waikato 15

10/19/2007 University of Waikato 16

10/19/2007 University of Waikato 17

10/19/2007 University of Waikato 18

10/19/2007 University of Waikato 19

10/19/2007 University of Waikato 20

10/19/2007 University of Waikato 21

10/19/2007 University of Waikato 22

10/19/2007 University of Waikato 23

10/19/2007 University of Waikato 24

10/19/2007 University of Waikato 25

10/19/2007 University of Waikato 26

10/19/2007 University of Waikato 27

10/19/2007 University of Waikato 28

10/19/2007 University of Waikato 29

10/19/2007 University of Waikato 30

10/19/2007 University of Waikato 31

10/19/2007 University of Waikato 32

Explorer: building “classifiers”� Classifiers in WEKA are models for predicting

nominal or numeric quantities� Implemented learning schemes include:

� Decision trees and lists, instance-based classifiers,support vector machines, multi-layer perceptrons,logistic regression, Bayes’ nets, …

� “Meta”-classifiers include:� Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

10/19/2007 University of Waikato 33

10/19/2007 University of Waikato 34

10/19/2007 University of Waikato 35

10/19/2007 University of Waikato 36

10/19/2007 University of Waikato 37

10/19/2007 University of Waikato 38

10/19/2007 University of Waikato 39

10/19/2007 University of Waikato 40

10/19/2007 University of Waikato 41

10/19/2007 University of Waikato 42

10/19/2007 University of Waikato 43

10/19/2007 University of Waikato 44

10/19/2007 University of Waikato 45

10/19/2007 University of Waikato 46

10/19/2007 University of Waikato 47

10/19/2007 University of Waikato 48

10/19/2007 University of Waikato 49

10/19/2007 University of Waikato 50

10/19/2007 University of Waikato 51

10/19/2007 University of Waikato 52

10/19/2007 University of Waikato 53

10/19/2007 University of Waikato 54

10/19/2007 University of Waikato 55

10/19/2007 University of Waikato 56

10/19/2007 University of Waikato 57

10/19/2007 University of Waikato 58

10/19/2007 University of Waikato 59

10/19/2007 University of Waikato 60

10/19/2007 University of Waikato 61

10/19/2007 University of Waikato 62

10/19/2007 University of Waikato 63

10/19/2007 University of Waikato 64

10/19/2007 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 66QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 67QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 68

10/19/2007 University of Waikato 69

10/19/2007 University of Waikato 70

10/19/2007 University of Waikato 71

10/19/2007 University of Waikato 72

10/19/2007 University of Waikato 73

10/19/2007 University of Waikato 74

10/19/2007 University of Waikato 75

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 76

10/19/2007 University of Waikato 77

10/19/2007 University of Waikato 78

10/19/2007 University of Waikato 79

10/19/2007 University of Waikato 80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 82

10/19/2007 University of Waikato 83

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/19/2007 University of Waikato 84

10/19/2007 University of Waikato 85

10/19/2007 University of Waikato 86

10/19/2007 University of Waikato 87

10/19/2007 University of Waikato 88

10/19/2007 University of Waikato 89

10/19/2007 University of Waikato 90

10/19/2007 University of Waikato 91

10/19/2007 University of Waikato 92

Explorer: clustering data� WEKA contains “clusterers” for finding groups of

similar instances in a dataset� Implemented schemes are:

� k-Means, EM, Cobweb, X-means, FarthestFirst� Clusters can be visualized and compared to “true”

clusters (if given)� Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

10/19/2007 University of Waikato 93

10/19/2007 University of Waikato 94

10/19/2007 University of Waikato 95

10/19/2007 University of Waikato 96

10/19/2007 University of Waikato 97

10/19/2007 University of Waikato 98

10/19/2007 University of Waikato 99

10/19/2007 University of Waikato 100

10/19/2007 University of Waikato 101

10/19/2007 University of Waikato 102

10/19/2007 University of Waikato 103

10/19/2007 University of Waikato 104

10/19/2007 University of Waikato 105

10/19/2007 University of Waikato 106

10/19/2007 University of Waikato 107

10/19/2007 University of Waikato 108

Explorer: finding associations� WEKA contains an implementation of the Apriori

algorithm for learning association rules� Works only with discrete data

� Can identify statistical dependencies betweengroups of attributes:� milk, butter ⇒ bread, eggs (with confidence 0.9 and

support 2000)� Apriori can compute all rules that have a given

minimum support and exceed a given confidence

10/19/2007 University of Waikato 109

10/19/2007 University of Waikato 110

10/19/2007 University of Waikato 111

10/19/2007 University of Waikato 112

10/19/2007 University of Waikato 113

10/19/2007 University of Waikato 114

10/19/2007 University of Waikato 115

10/19/2007 University of Waikato 116

Explorer: attribute selection� Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones� Attribute selection methods contain two parts:

� A search method: best-first, forward selection,random, exhaustive, genetic algorithm, ranking

� An evaluation method: correlation-based, wrapper,information gain, chi-squared, …

� Very flexible: WEKA allows (almost) arbitrarycombinations of these two

10/19/2007 University of Waikato 117

10/19/2007 University of Waikato 118

10/19/2007 University of Waikato 119

10/19/2007 University of Waikato 120

10/19/2007 University of Waikato 121

10/19/2007 University of Waikato 122

10/19/2007 University of Waikato 123

10/19/2007 University of Waikato 124

10/19/2007 University of Waikato 125

Explorer: data visualization� Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem� WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)� To do: rotating 3-d visualizations (Xgobi-style)

� Color-coded class values� “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points)� “Zoom-in” function

10/19/2007 University of Waikato 126

10/19/2007 University of Waikato 127

10/19/2007 University of Waikato 128

10/19/2007 University of Waikato 129

10/19/2007 University of Waikato 130

10/19/2007 University of Waikato 131

10/19/2007 University of Waikato 132

10/19/2007 University of Waikato 133

10/19/2007 University of Waikato 134

10/19/2007 University of Waikato 135

10/19/2007 University of Waikato 136

10/19/2007 University of Waikato 137

10/19/2007 University of Waikato 138

Performing experiments� Experimenter makes it easy to compare the

performance of different learning schemes� For classification and regression problems� Results can be written into file or database� Evaluation options: cross-validation, learning

curve, hold-out� Can also iterate over different parameter settings� Significance-testing built in!

10/19/2007 University of Waikato 139

10/19/2007 University of Waikato 140

10/19/2007 University of Waikato 141

10/19/2007 University of Waikato 142

10/19/2007 University of Waikato 143

10/19/2007 University of Waikato 144

10/19/2007 University of Waikato 145

10/19/2007 University of Waikato 146

10/19/2007 University of Waikato 147

10/19/2007 University of Waikato 148

10/19/2007 University of Waikato 149

10/19/2007 University of Waikato 150

10/19/2007 University of Waikato 151

10/19/2007 University of Waikato 152

The Knowledge Flow GUI� New graphical user interface for WEKA� Java-Beans-based interface for setting up and

running machine learning experiments� Data sources, classifiers, etc. are beans and can

be connected graphically� Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”� Layouts can be saved and loaded again later

10/19/2007 University of Waikato 153

10/19/2007 University of Waikato 154

10/19/2007 University of Waikato 155

10/19/2007 University of Waikato 156

10/19/2007 University of Waikato 157

10/19/2007 University of Waikato 158

10/19/2007 University of Waikato 159

10/19/2007 University of Waikato 160

10/19/2007 University of Waikato 161

10/19/2007 University of Waikato 162

10/19/2007 University of Waikato 163

10/19/2007 University of Waikato 164

10/19/2007 University of Waikato 165

10/19/2007 University of Waikato 166

10/19/2007 University of Waikato 167

10/19/2007 University of Waikato 168

10/19/2007 University of Waikato 169

10/19/2007 University of Waikato 170

10/19/2007 University of Waikato 171

10/19/2007 University of Waikato 172

10/19/2007 University of Waikato 173

Conclusion: try it yourself!� WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka� Also has a list of projects based on WEKA� WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, BernhardPfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H.Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio deSouza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , RichardKirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle,Xin Xu, Yong Wang, Zhihai Wang