weka: a machine machine learning with...

Post on 13-Mar-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Department of Computer Science,

University of Waikato, New Zealand

Eibe Frank

◼ WEKA: A Machine

Learning Toolkit

◼ The Explorer

• Classification and

Regression

• Clustering

• Association Rules

• Attribute Selection

• Data Visualization

◼ The Experimenter

◼ The Knowledge

Flow GUI

◼ Conclusions

Machine Learning with WEKA

10/15/2018 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)

10/15/2018 University of Waikato 3

WEKA: the software

◼ Machine learning/data mining software written in

Java (distributed under the GNU Public License)

◼ Used for research, education, and applications

◼ Complements “Data Mining” by Witten & Frank

◼ Main features:

◆ Comprehensive set of data pre-processing tools,

learning algorithms and evaluation methods

◆ Graphical user interfaces (incl. data visualization)

◆ Environment for comparing learning algorithms

10/15/2018 University of Waikato 4

WEKA: versions

◼ There are several versions of WEKA:

◆ WEKA 3.0: “book version” compatible with

description in data mining book

◆ WEKA 3.2: “GUI version” adds graphical user

interfaces (book version is command-line only)

◆ WEKA 3.3: “development version” with lots of

improvements

◼ This talk is based on the latest snapshot of WEKA

3.3 (soon to be WEKA 3.4)

10/15/2018 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...

WEKA only deals with “flat” files

10/15/2018 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present

67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present

...

WEKA only deals with “flat” files

10/15/2018 University of Waikato 7

10/15/2018 University of Waikato 8

Explorer: Exploring the data

◼ Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary

◼ Data can also be read from a URL or from an SQL

database (using JDBC)

◼ Pre-processing tools in WEKA are called “filters”

◼ WEKA contains filters for:

◆ Discretization, normalization, resampling, attribute

selection, transforming and combining attributes, …

10/15/2018 University of Waikato 9

10/15/2018 University of Waikato 10

10/15/2018 University of Waikato 11

10/15/2018 University of Waikato 12

10/15/2018 University of Waikato 13

10/15/2018 University of Waikato 14

10/15/2018 University of Waikato 15

10/15/2018 University of Waikato 16

10/15/2018 University of Waikato 17

10/15/2018 University of Waikato 18

10/15/2018 University of Waikato 19

10/15/2018 University of Waikato 20

10/15/2018 University of Waikato 21

10/15/2018 University of Waikato 22

10/15/2018 University of Waikato 23

10/15/2018 University of Waikato 24

10/15/2018 University of Waikato 25

10/15/2018 University of Waikato 26

10/15/2018 University of Waikato 27

10/15/2018 University of Waikato 28

10/15/2018 University of Waikato 29

10/15/2018 University of Waikato 30

Explorer: building “classifiers”

◼ Classifiers in WEKA are models for predicting

nominal or numeric quantities

◼ Implemented learning schemes include:

◆ Decision trees and lists, instance-based classifiers,

support vector machines, multi-layer perceptrons,

logistic regression, Bayes’ nets, …

◼ “Meta”-classifiers include:

◆ Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

10/15/2018 University of Waikato 31

10/15/2018 University of Waikato 32

10/15/2018 University of Waikato 33

10/15/2018 University of Waikato 34

10/15/2018 University of Waikato 35

10/15/2018 University of Waikato 36

10/15/2018 University of Waikato 37

10/15/2018 University of Waikato 38

10/15/2018 University of Waikato 39

10/15/2018 University of Waikato 40

10/15/2018 University of Waikato 41

10/15/2018 University of Waikato 42

10/15/2018 University of Waikato 43

10/15/2018 University of Waikato 44

10/15/2018 University of Waikato 45

10/15/2018 University of Waikato 46

10/15/2018 University of Waikato 47

10/15/2018 University of Waikato 48

10/15/2018 University of Waikato 49

10/15/2018 University of Waikato 50

10/15/2018 University of Waikato 51

10/15/2018 University of Waikato 52

10/15/2018 University of Waikato 53

10/15/2018 University of Waikato 54

10/15/2018 University of Waikato 55

10/15/2018 University of Waikato 56

10/15/2018 University of Waikato 57

10/15/2018 University of Waikato 58

10/15/2018 University of Waikato 59

10/15/2018 University of Waikato 60

10/15/2018 University of Waikato 61

10/15/2018 University of Waikato 62

10/15/2018 University of Waikato 63QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/15/2018 University of Waikato 64QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/15/2018 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

10/15/2018 University of Waikato 66

10/15/2018 University of Waikato 67

10/15/2018 University of Waikato 68

10/15/2018 University of Waikato 69

10/15/2018 University of Waikato 70

10/15/2018 University of Waikato 71

10/15/2018 University of Waikato 72

10/15/2018 University of Waikato 73

Q uick Tim e™ and a TIFF (LZW) dec om press or are needed to s ee this p ic ture.

10/15/2018 University of Waikato 74

10/15/2018 University of Waikato 75

10/15/2018 University of Waikato 76

10/15/2018 University of Waikato 77

10/15/2018 University of Waikato 78

QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.

10/15/2018 University of Waikato 79

QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.

10/15/2018 University of Waikato 80

10/15/2018 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.

10/15/2018 University of Waikato 82

10/15/2018 University of Waikato 83

10/15/2018 University of Waikato 84

10/15/2018 University of Waikato 85

10/15/2018 University of Waikato 86

10/15/2018 University of Waikato 87

10/15/2018 University of Waikato 88

10/15/2018 University of Waikato 89

10/15/2018 University of Waikato 90

Explorer: clustering data

◼ WEKA contains “clusterers” for finding groups of

similar instances in a dataset

◼ Implemented schemes are:

◆ k-Means, EM, Cobweb, X-means, FarthestFirst

◼ Clusters can be visualized and compared to “true”

clusters (if given)

◼ Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

10/15/2018 University of Waikato 91

10/15/2018 University of Waikato 92

10/15/2018 University of Waikato 93

10/15/2018 University of Waikato 94

10/15/2018 University of Waikato 95

10/15/2018 University of Waikato 96

10/15/2018 University of Waikato 97

10/15/2018 University of Waikato 98

10/15/2018 University of Waikato 99

10/15/2018 University of Waikato 100

10/15/2018 University of Waikato 101

10/15/2018 University of Waikato 102

10/15/2018 University of Waikato 103

10/15/2018 University of Waikato 104

10/15/2018 University of Waikato 105

10/15/2018 University of Waikato 106

Explorer: finding associations

◼ WEKA contains an implementation of the Apriori

algorithm for learning association rules

◆ Works only with discrete data

◼ Can identify statistical dependencies between

groups of attributes:

◆ milk, butter bread, eggs (with confidence 0.9 and

support 2000)

◼ Apriori can compute all rules that have a given

minimum support and exceed a given confidence

10/15/2018 University of Waikato 107

10/15/2018 University of Waikato 108

10/15/2018 University of Waikato 109

10/15/2018 University of Waikato 110

10/15/2018 University of Waikato 111

10/15/2018 University of Waikato 112

10/15/2018 University of Waikato 113

10/15/2018 University of Waikato 114

Explorer: attribute selection

◼ Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones

◼ Attribute selection methods contain two parts:

◆ A search method: best-first, forward selection,

random, exhaustive, genetic algorithm, ranking

◆ An evaluation method: correlation-based, wrapper,

information gain, chi-squared, …

◼ Very flexible: WEKA allows (almost) arbitrary

combinations of these two

10/15/2018 University of Waikato 115

10/15/2018 University of Waikato 116

10/15/2018 University of Waikato 117

10/15/2018 University of Waikato 118

10/15/2018 University of Waikato 119

10/15/2018 University of Waikato 120

10/15/2018 University of Waikato 121

10/15/2018 University of Waikato 122

10/15/2018 University of Waikato 123

Explorer: data visualization

◼ Visualization very useful in practice: e.g. helps to

determine difficulty of the learning problem

◼ WEKA can visualize single attributes (1-d) and

pairs of attributes (2-d)

◆ To do: rotating 3-d visualizations (Xgobi-style)

◼ Color-coded class values

◼ “Jitter” option to deal with nominal attributes (and

to detect “hidden” data points)

◼ “Zoom-in” function

10/15/2018 University of Waikato 124

10/15/2018 University of Waikato 125

10/15/2018 University of Waikato 126

10/15/2018 University of Waikato 127

10/15/2018 University of Waikato 128

10/15/2018 University of Waikato 129

10/15/2018 University of Waikato 130

10/15/2018 University of Waikato 131

10/15/2018 University of Waikato 132

10/15/2018 University of Waikato 133

10/15/2018 University of Waikato 134

10/15/2018 University of Waikato 135

10/15/2018 University of Waikato 136

Performing experiments

◼ Experimenter makes it easy to compare the

performance of different learning schemes

◼ For classification and regression problems

◼ Results can be written into file or database

◼ Evaluation options: cross-validation, learning

curve, hold-out

◼ Can also iterate over different parameter settings

◼ Significance-testing built in!

10/15/2018 University of Waikato 137

10/15/2018 University of Waikato 138

10/15/2018 University of Waikato 139

10/15/2018 University of Waikato 140

10/15/2018 University of Waikato 141

10/15/2018 University of Waikato 142

10/15/2018 University of Waikato 143

10/15/2018 University of Waikato 144

10/15/2018 University of Waikato 145

10/15/2018 University of Waikato 146

10/15/2018 University of Waikato 147

10/15/2018 University of Waikato 148

10/15/2018 University of Waikato 149

10/15/2018 University of Waikato 150

The Knowledge Flow GUI

◼ New graphical user interface for WEKA

◼ Java-Beans-based interface for setting up and

running machine learning experiments

◼ Data sources, classifiers, etc. are beans and can

be connected graphically

◼ Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”

◼ Layouts can be saved and loaded again later

10/15/2018 University of Waikato 151

10/15/2018 University of Waikato 152

10/15/2018 University of Waikato 153

10/15/2018 University of Waikato 154

10/15/2018 University of Waikato 155

10/15/2018 University of Waikato 156

10/15/2018 University of Waikato 157

10/15/2018 University of Waikato 158

10/15/2018 University of Waikato 159

10/15/2018 University of Waikato 160

10/15/2018 University of Waikato 161

10/15/2018 University of Waikato 162

10/15/2018 University of Waikato 163

10/15/2018 University of Waikato 164

10/15/2018 University of Waikato 165

10/15/2018 University of Waikato 166

10/15/2018 University of Waikato 167

10/15/2018 University of Waikato 168

10/15/2018 University of Waikato 169

10/15/2018 University of Waikato 170

10/15/2018 University of Waikato 171

Conclusion: try it yourself!

◼ WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

▪ Also has a list of projects based on WEKA

▪ WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard

Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H.

Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de

Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby,

Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu,

Yong Wang, Zhihai Wang

top related