hand on weka - ijskt.ijs.si/petrakralj/ips_dm_1415/handsonweka-part1.pdf · hand on weka 2014/11/11...

30
http://kt.ijs.si/petra_kralj/dmkd.html Hand on Weka 2014/11/11 Petra Kralj Novak [email protected]

Upload: tranmien

Post on 18-Apr-2018

226 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Hand on Weka 2014/11/11

Petra Kralj Novak

[email protected]

Page 2: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Data Mining Tools

• Weka http://www.cs.waikato.ac.nz/ml/weka/

• Orange http://orange.biolab.si/

• Knime http://www.knime.org/

• Taverna http://www.taverna.org.uk/

• Rapid Miner http://rapid-i.com/content/view/181/196/

• ClowdFlows http://clowdflows.org/

Page 3: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Weka (Waikato Environment for Knowledge Analysis)

• Collection of machine learning algorithms for data mining tasks

• The algorithms

– Can be applied directly to a dataset

– Can be called from Java code (library)

• Weka contains tools for

– Data pre-processing

– Classification

– Regression

– Clustering

– Association rules

– Visualization

• Weka is open source software issued under the GNU General Public

Licanse

Page 4: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Exsercise1: ID3 in Weka

1. Build a decision tree with the ID3 algorithm on the lenses dataset,

evaluate on a separate test set

Page 5: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Weka: Install

Download

version

3.6

http://www.cs.waikato.ac.nz/ml/weka/

Page 6: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Weka: Run Explorer

Choose Explorer

Page 7: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Exercise 1: ID3 in Weka

• In the Weka data mining tool, induce a decision

tree for the lenses dataset with the ID3

algorithm.

• Data: – lensesTrain.arff

– lensesTest.arff

• Compare the outcome with the manually

obtained results.

Page 8: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data

Page 9: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data - 2

lensesTrain.arff

Page 10: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

The data are loaded

Target variable

Choose

“Classify”

Page 11: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Choose algoritem

Page 12: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

trees

Id3

Page 13: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

1 2

3

5

lensesTest.arff

4

Page 14: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Decision tree

Page 15: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Classification accuracy

Confusion

matrix

Page 16: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Exercise 2: CAR dataset

• 1728 examples

• 6 attributes – 6 nominal

– 0 numeric

• Nominal target variable – 4 classes: unacc, acc, good, v-good

– Distribution of classes • unacc (70%), acc (22%), good (4%), v-good (4%)

• No missing values

Page 17: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Preparing the data for WEKA - 1

Data in a spreadsheet

(e.g. MS Excel)

- Rows are examples

- Columns are attributes

- The last column is the target

variable

Page 18: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Preparing the data for WEKA - 2

Save as “.csv” - Careful with dots “.”,

commas “,” and

semicolons “;”!

Page 19: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Load the data

Target variable

Car.csv

Page 20: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Choose algorithm J48

Page 21: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Building and evaluating the tree

Page 22: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Classified as

Actual values

Classification

accuracy

Page 23: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Right

mouse

click

Page 24: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Tree pruning

Set the minimal number

of objects per leaf to 15

Parameters of the

algorithm (right

mouse click)

Page 25: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Reduced

number of

leaves and

nodes Easier to interpret

Lower

classification

accuracy

Page 26: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Page 27: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Naïve Bayes classifier

Page 28: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Page 29: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Page 30: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,

http://kt.ijs.si/petra_kralj/dmkd.html

Summary

• Weka

• ID3, separate test set

• Data preparation

• J48 (C4.5), cross validation, tree prunning

• Naïve Bayes