weka: brief introductionweka: brief...

13
Weka: Brief Introduction Weka: Brief Introduction Features Covered in this Lecture { Preprocessing Examining datasets and { Preprocessing Examining datasets and using filters. { Classification selecting and running classifiers. { Visualization Tools brief exposure Explorer: Preprocessing the data Explorer: Preprocessing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called filtersfilters WEKA contains filters for: { Discretization normalization resampling attribute { Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … WEKA only deals with flatfiles @relation heart-disease-simplified WEKA only deals with flat files @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...

Upload: others

Post on 23-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

  • Weka: Brief IntroductionWeka: Brief Introduction

    Features Covered in this LecturePreprocessing – Examining datasets andPreprocessing Examining datasets and using filters.Classification – selecting and running g gclassifiers.Visualization Tools – brief exposure

    Explorer: Preprocessing the dataExplorer: Preprocessing the data

    Data can be imported from a file in various formats: ARFF, CSV, C4.5, binaryData can also be read from a URL or from an SQL database (using JDBC)Pre-processing tools in WEKA are called “filters”filtersWEKA contains filters for:

    Discretization normalization resampling attributeDiscretization, normalization, resampling, attribute selection, transforming and combining attributes, …

    WEKA only deals with “flat” files@relation heart-disease-simplified

    WEKA only deals with flat files

    @attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

    atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

    @data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,presenty p y p67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

  • WEKA only deals with “flat” files@relation heart-disease-simplified

    WEKA only deals with flat files

    @attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal,

    atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

    @data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,presenty p y p67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

  • Explorer: building “classifiers”p g

    Classifiers in WEKA are models for predicting nominal or numeric quantitiesqImplemented learning schemes include:

    Decision trees and lists instance-based classifiersDecision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …g g , y ,

    “Meta”-classifiers include:Bagging boosting stacking error-correcting outputBagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

  • Homework #1 – Due Feb. 11

    A l th d t t f th UCI it i th W kAnalyze the zoo dataset from the UCI repository using the Weka Explorer.

    For each of the attributes feathers, predators, tail, and domestic, report on the types and numbers of animals having the attribute trueon the types and numbers of animals having the attribute true.Remove instances whose “type” attribute is larger than or equal to 4. Use the classifier J48graft to derive the corresponding decision tree. Draw the corresponding tree. p gUse the rules classifier PART to derive the rules on the zoo dataset. List the rules obtained.Remove the “type” attribute from the dataset and run the default clustering algorithm SimpleKMeans. How many clusters do you obtain? Can you relate these clusters to the initial class values?