weka: brief introductionweka: brief...

Weka: Brief IntroductionWeka: Brief Introduction

Features Covered in this LecturePreprocessing – Examining datasets andPreprocessing Examining datasets and using filters.Classification – selecting and running g gclassifiers.Visualization Tools – brief exposure

Explorer: Preprocessing the dataExplorer: Preprocessing the data

Data can be imported from a file in various formats: ARFF, CSV, C4.5, binaryData can also be read from a URL or from an SQL database (using JDBC)Pre-processing tools in WEKA are called “filters”filtersWEKA contains filters for:

Discretization normalization resampling attributeDiscretization, normalization, resampling, attribute selection, transforming and combining attributes, …

WEKA only deals with “flat” files@relation heart-disease-simplified

WEKA only deals with flat files

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,presenty p y p67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files@relation heart-disease-simplified

WEKA only deals with flat files

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,presenty p y p67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

Explorer: building “classifiers”p g

Classifiers in WEKA are models for predicting nominal or numeric quantitiesqImplemented learning schemes include:

Decision trees and lists instance-based classifiersDecision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …g g , y ,

“Meta”-classifiers include:Bagging boosting stacking error-correcting outputBagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

Homework #1 – Due Feb. 11

A l th d t t f th UCI it i th W kAnalyze the zoo dataset from the UCI repository using the Weka Explorer.

For each of the attributes feathers, predators, tail, and domestic, report on the types and numbers of animals having the attribute trueon the types and numbers of animals having the attribute true.Remove instances whose “type” attribute is larger than or equal to 4. Use the classifier J48graft to derive the corresponding decision tree. Draw the corresponding tree. p gUse the rules classifier PART to derive the rules on the zoo dataset. List the rules obtained.Remove the “type” attribute from the dataset and run the default clustering algorithm SimpleKMeans. How many clusters do you obtain? Can you relate these clusters to the initial class values?

weka: brief introductionweka: brief...

Documents