decision tree algorithm short weka tutorial -...

65
Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010

Upload: tranxuyen

Post on 03-Feb-2018

261 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision tree algorithmshort Weka tutorial

Croce Danilo, Roberto Basili

Machine leanring for Web Mininga.a. 2009-2010

Page 2: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Machine Learning: brief summary

Example

You need to write a program that:given a Level Hierarchy of a companygiven an employe described trough some attributes (the number ofattributes can be very high)assign to the employe the correct level into the hierarchy.

How many if are necessary to select the correct level?How many time is necessary to study the relations between the hierarchy andattributes?

SolutionLearn the function to link each employe to the correct level.

Page 3: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Machine Learning: brief summary

Example

You need to write a program that:given a Level Hierarchy of a companygiven an employe described trough some attributes (the number ofattributes can be very high)assign to the employe the correct level into the hierarchy.

How many if are necessary to select the correct level?

How many time is necessary to study the relations between the hierarchy andattributes?

SolutionLearn the function to link each employe to the correct level.

Page 4: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Machine Learning: brief summary

Example

You need to write a program that:given a Level Hierarchy of a companygiven an employe described trough some attributes (the number ofattributes can be very high)assign to the employe the correct level into the hierarchy.

How many if are necessary to select the correct level?How many time is necessary to study the relations between the hierarchy andattributes?

SolutionLearn the function to link each employe to the correct level.

Page 5: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Machine Learning: brief summary

Example

You need to write a program that:given a Level Hierarchy of a companygiven an employe described trough some attributes (the number ofattributes can be very high)assign to the employe the correct level into the hierarchy.

How many if are necessary to select the correct level?How many time is necessary to study the relations between the hierarchy andattributes?

SolutionLearn the function to link each employe to the correct level.

Page 6: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Supervised Learning process: two steps

Learning (Training)

Learn a model using the training data

Testing

Test the model using unseen test data to assess the model accuracy

Page 7: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Supervised Learning process: two steps

Learning (Training)

Learn a model using the training data

Testing

Test the model using unseen test data to assess the model accuracy

Page 8: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Supervised Learning process: two steps

Learning (Training)

Learn a model using the training data

Testing

Test the model using unseen test data to assess the model accuracy

Page 9: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Learning Algorithms

Probabilistic Functions (Bayesian Classifier)Functions to partitioning Vector Space

Non-Linear: KNN, Neural Networks, ...Linear: Support Vector Machines, Perceptron, ...

Boolean Functions (Decision Trees)

Page 10: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Learning Algorithms

Probabilistic Functions (Bayesian Classifier)Functions to partitioning Vector Space

Non-Linear: KNN, Neural Networks, ...Linear: Support Vector Machines, Perceptron, ...

Boolean Functions (Decision Trees)

Page 11: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Decision Tree: Domain Example

The class to learn is: approve a loan

Page 12: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Decision Tree

Decision Tree example for the loan problem

Page 13: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.

We want smaller tree and accurate tree.Easy to understand and perform better.

Finding the best tree isNP-hard.All current tree buildingalgorithms are heuristicalgorithmsA decision tree can beconverted to a set of rules .

Page 14: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.We want smaller tree and accurate tree.

Easy to understand and perform better.

Finding the best tree isNP-hard.All current tree buildingalgorithms are heuristicalgorithmsA decision tree can beconverted to a set of rules .

Page 15: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.We want smaller tree and accurate tree.

Easy to understand and perform better.

Finding the best tree isNP-hard.All current tree buildingalgorithms are heuristicalgorithmsA decision tree can beconverted to a set of rules .

Page 16: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.We want smaller tree and accurate tree.

Easy to understand and perform better.

Finding the best tree isNP-hard.

All current tree buildingalgorithms are heuristicalgorithmsA decision tree can beconverted to a set of rules .

Page 17: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.We want smaller tree and accurate tree.

Easy to understand and perform better.

Finding the best tree isNP-hard.All current tree buildingalgorithms are heuristicalgorithms

A decision tree can beconverted to a set of rules .

Page 18: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Is the decision tree unique?

No. Here is a simpler tree.We want smaller tree and accurate tree.

Easy to understand and perform better.

Finding the best tree isNP-hard.All current tree buildingalgorithms are heuristicalgorithmsA decision tree can beconverted to a set of rules .

Page 19: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

From a decision tree to a set of rules

Each path from the root to a leaf is a rule

RulesOwn_house = true→ Class = yesOwn_house = false , Has_job = true→ Class = yesOwn_house = false , Has_job = false→ Class = no

Page 20: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

From a decision tree to a set of rules

Each path from the root to a leaf is a rule

RulesOwn_house = true→ Class = yesOwn_house = false , Has_job = true→ Class = yesOwn_house = false , Has_job = false→ Class = no

Page 21: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

From a decision tree to a set of rules

Each path from the root to a leaf is a rule

RulesOwn_house = true→ Class = yesOwn_house = false , Has_job = true→ Class = yesOwn_house = false , Has_job = false→ Class = no

Page 22: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Choose an attribute to partition data

How chose the best attribute set?

The objective is to reduce the impurity or uncertainty in data as much aspossible

A subset of data is pure if all instances belong to the same class.The heuristic is to choose the attribute with the maximum Information Gainor Gain Ratio based on information theory.

Page 23: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Choose an attribute to partition data

How chose the best attribute set?The objective is to reduce the impurity or uncertainty in data as much aspossible

A subset of data is pure if all instances belong to the same class.The heuristic is to choose the attribute with the maximum Information Gainor Gain Ratio based on information theory.

Page 24: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Choose an attribute to partition data

How chose the best attribute set?The objective is to reduce the impurity or uncertainty in data as much aspossible

A subset of data is pure if all instances belong to the same class.

The heuristic is to choose the attribute with the maximum Information Gainor Gain Ratio based on information theory.

Page 25: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Choose an attribute to partition data

How chose the best attribute set?The objective is to reduce the impurity or uncertainty in data as much aspossible

A subset of data is pure if all instances belong to the same class.The heuristic is to choose the attribute with the maximum Information Gainor Gain Ratio based on information theory.

Page 26: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Choose an attribute to partition data

How chose the best attribute set?The objective is to reduce the impurity or uncertainty in data as much aspossible

A subset of data is pure if all instances belong to the same class.The heuristic is to choose the attribute with the maximum Information Gainor Gain Ratio based on information theory.

Page 27: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Information Gain

Entropy of D

Entropy is a measure of the uncertainty associated with a randomvariable.Given a set of examples D is possible to compute the original entropyof the dataset such as:

H[D] =−|C|

∑j=1

P(cj)log2P(cj)

where C is the set of desired class.

Page 28: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Entropy

As the data become purer and purer, the entropy value becomes smaller andsmaller.

Page 29: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Information Gain

Entropy of D

Given a set of examples D is possible to compute the original entropy of thedataset such as:

H[D] =−|C|

∑j=1

P(cj)log2P(cj)

where C is the set of desired class.

Entropy of an attribute Ai

If we make attribute Ai, with v values, the root of the current tree, this willpartition D into v subsets D1,D2, . . . ,Dv . The expected entropy if Ai is usedas the current root:

HAi [D] =v

∑j=1

|Dj||D|

H[Dj]

Page 30: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Information Gain

Information Gain

Information gained by selecting attribute Ai to branch or to partition the datais given by the difference of prior entropy and the entropy of selected branch

gain(D,Ai) = H[D]−HAi [D]

We choose the attribute with the highest gain to branch/split the current tree.

Page 31: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Information Gain

Information Gain

Information gained by selecting attribute Ai to branch or to partition the datais given by the difference of prior entropy and the entropy of selected branch

gain(D,Ai) = H[D]−HAi [D]

We choose the attribute with the highest gain to branch/split the current tree.

Page 32: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Decision Tree: Domain Example

Back to the example

The class to learn is: approve a loan

Page 33: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

9 examples belong to "YES" categoryand 6 to "NO". Exploiting priorknowledge we have:

H[D] =−|C|

∑j=1

P(cj)log2P(cj)

H[D] =− 615

log26

15− 9

15log2

915

= 0.971

while partitioning through the Age feature:

HAge[D] =− 515

H[D1]−5

15H[D2]−

515

H[D3] = 0.888

whereH[D1] =−

33+2

· log2(3

3+2)− 2

3+2· log2(

23+2

) = 0.971

H[D2] =−2

2+3· log2(

22+3

)− 32+3

· log2(3

2+3) = 0.971

H[D3] =−1

1+4· log2(

11+4

)− 41+4

· log2(4

1+4) = 0.722

Page 34: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

9 examples belong to "YES" categoryand 6 to "NO". Exploiting priorknowledge we have:

H[D] =−|C|

∑j=1

P(cj)log2P(cj)

H[D] =− 615

log26

15− 9

15log2

915

= 0.971

while partitioning through the Age feature:

HAge[D] =− 515

H[D1]−5

15H[D2]−

515

H[D3] = 0.888

whereH[D1] =−

33+2

· log2(3

3+2)− 2

3+2· log2(

23+2

) = 0.971

H[D2] =−2

2+3· log2(

22+3

)− 32+3

· log2(3

2+3) = 0.971

H[D3] =−1

1+4· log2(

11+4

)− 41+4

· log2(4

1+4) = 0.722

Page 35: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

H[D] =− 615

log2615− 9

15log2

915

= 0.971

HOH[D] =− 615

H[D1]−9

15H[D2] =

− 615×0+

915×0.918 = 0.551

gain(D,Age) = 0.971−0.888 = 0.083gain(D,Own_House) = 0.971−0.551 = 0.420gain(D,Has_Job) = 0.971−0.647 = 0.324gain(D,Credit) = 0.971−0.608 = 0.363

Page 36: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

H[D] =− 615

log2615− 9

15log2

915

= 0.971

HOH [D] =− 615

H[D1]−9

15H[D2] =

− 615×0+

915×0.918 = 0.551

gain(D,Age) = 0.971−0.888 = 0.083gain(D,Own_House) = 0.971−0.551 = 0.420gain(D,Has_Job) = 0.971−0.647 = 0.324gain(D,Credit) = 0.971−0.608 = 0.363

Page 37: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

H[D] =− 615

log2615− 9

15log2

915

= 0.971

HOH [D] =− 615

H[D1]−9

15H[D2] =

− 615×0+

915×0.918 = 0.551

gain(D,Age) = 0.971−0.888 = 0.083gain(D,Own_House) = 0.971−0.551 = 0.420gain(D,Has_Job) = 0.971−0.647 = 0.324gain(D,Credit) = 0.971−0.608 = 0.363

Page 38: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Example

H[D] =− 615

log2615− 9

15log2

915

= 0.971

HOH [D] =− 615

H[D1]−9

15H[D2] =

− 615×0+

915×0.918 = 0.551

gain(D,Age) = 0.971−0.888 = 0.083gain(D,Own_House) = 0.971−0.551 = 0.420gain(D,Has_Job) = 0.971−0.647 = 0.324gain(D,Credit) = 0.971−0.608 = 0.363

Page 39: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical now

Tree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 40: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive manner

At start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 41: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the root

Examples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 42: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributes

Attributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 43: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 44: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same class

There are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 45: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leaf

There are no examples left

Page 46: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Basic algorithm (a greedy divide-and-conquer algorithm)

Assume attributes are categorical nowTree is constructed in a top-down recursive mannerAt start, all the training examples are at the rootExamples are partitioned recursively based on selected attributesAttributes are selected on the basis of an impurity function (e.g.,information gain)

Conditions for stopping partitioning

All examples for a given node belong to the same classThere are no remaining attributes for further partitioning ? majorityclass is the leafThere are no examples left

Page 47: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Algorithm for decision tree learning

Page 48: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

What is WEKA?

Collection of ML algorithms - open-source Java packageSite:http://www.cs.waikato.ac.nz/ml/weka/Documentation:http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Schemes for classification include:decision trees, rule learners, naive Bayes, decision tables, locallyweighted regression, SVMs, instance-based learners, logistic regression,voted perceptrons, multi-layer perceptron

For classification, Weka allows train/test split or Cross-fold validationSchemes for clustering:

EM and Cobweb

Page 49: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

What is WEKA?

Collection of ML algorithms - open-source Java packageSite:http://www.cs.waikato.ac.nz/ml/weka/Documentation:http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Schemes for classification include:decision trees, rule learners, naive Bayes, decision tables, locallyweighted regression, SVMs, instance-based learners, logistic regression,voted perceptrons, multi-layer perceptron

For classification, Weka allows train/test split or Cross-fold validationSchemes for clustering:

EM and Cobweb

Page 50: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

What is WEKA?

Collection of ML algorithms - open-source Java packageSite:http://www.cs.waikato.ac.nz/ml/weka/Documentation:http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Schemes for classification include:decision trees, rule learners, naive Bayes, decision tables, locallyweighted regression, SVMs, instance-based learners, logistic regression,voted perceptrons, multi-layer perceptron

For classification, Weka allows train/test split or Cross-fold validationSchemes for clustering:

EM and Cobweb

Page 51: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

What is WEKA?

Collection of ML algorithms - open-source Java packageSite:http://www.cs.waikato.ac.nz/ml/weka/Documentation:http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Schemes for classification include:decision trees, rule learners, naive Bayes, decision tables, locallyweighted regression, SVMs, instance-based learners, logistic regression,voted perceptrons, multi-layer perceptron

For classification, Weka allows train/test split or Cross-fold validationSchemes for clustering:

EM and Cobweb

Page 52: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA

@RELATION declaration associates a name with the dataset@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by [email protected], 0.2, Setosa1.4, ?, Versicolor

Page 53: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by [email protected], 0.2, Setosa1.4, ?, Versicolor

Page 54: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by [email protected], 0.2, Setosa1.4, ?, Versicolor

Page 55: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by [email protected], 0.2, Setosa1.4, ?, Versicolor

Page 56: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by ?

@DATA1.4, 0.2, Setosa1.4, ?, Versicolor

Page 57: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF File Format

Require declarations of @RELATION, @ATTRIBUTE and @DATA@RELATION declaration associates a name with the dataset

@RELATION <relation-name>

@ATTRIBUTE declaration specifies the name and type of an attribute@ATTRIBUTE <attribute-name> <datatype>Datatype can be numeric, nominal, string or date

@ATTRIBUTE sepallength NUMERIC@ATTRIBUTE petalwidth NUMERIC@ATTRIBUTE class {Setosa,Versicolor,Virginica}

@DATA declaration is a single line denoting the start of the datasegment

Missing values are represented by [email protected], 0.2, Setosa1.4, ?, Versicolor

Page 58: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF Sparse File Format

Similar to AARF files except that data value 0 are not represented

Non-zero attributes are specified by attribute number and valueFull:

@DATA0 , X , 0 , Y , ”class A"0 , 0 , W , 0 , ”class B"

Sparse:@DATA{1 X, 3 Y, 4 ”class A"}{2 W, 4 ”class B"}

Note that the omitted values in a sparse instance are 0, they are notmissing values! If a value is unknown, you must explicitly represent itwith a question mark (?)

Page 59: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF Sparse File Format

Similar to AARF files except that data value 0 are not representedNon-zero attributes are specified by attribute number and value

Full:@DATA0 , X , 0 , Y , ”class A"0 , 0 , W , 0 , ”class B"

Sparse:@DATA{1 X, 3 Y, 4 ”class A"}{2 W, 4 ”class B"}

Note that the omitted values in a sparse instance are 0, they are notmissing values! If a value is unknown, you must explicitly represent itwith a question mark (?)

Page 60: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF Sparse File Format

Similar to AARF files except that data value 0 are not representedNon-zero attributes are specified by attribute number and valueFull:

@DATA0 , X , 0 , Y , ”class A"0 , 0 , W , 0 , ”class B"

Sparse:@DATA{1 X, 3 Y, 4 ”class A"}{2 W, 4 ”class B"}

Note that the omitted values in a sparse instance are 0, they are notmissing values! If a value is unknown, you must explicitly represent itwith a question mark (?)

Page 61: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF Sparse File Format

Similar to AARF files except that data value 0 are not representedNon-zero attributes are specified by attribute number and valueFull:

@DATA0 , X , 0 , Y , ”class A"0 , 0 , W , 0 , ”class B"

Sparse:@DATA{1 X, 3 Y, 4 ”class A"}{2 W, 4 ”class B"}

Note that the omitted values in a sparse instance are 0, they are notmissing values! If a value is unknown, you must explicitly represent itwith a question mark (?)

Page 62: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

ARFF Sparse File Format

Similar to AARF files except that data value 0 are not representedNon-zero attributes are specified by attribute number and valueFull:

@DATA0 , X , 0 , Y , ”class A"0 , 0 , W , 0 , ”class B"

Sparse:@DATA{1 X, 3 Y, 4 ”class A"}{2 W, 4 ”class B"}

Note that the omitted values in a sparse instance are 0, they are notmissing values! If a value is unknown, you must explicitly represent itwith a question mark (?)

Page 63: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Running Learning Schemes

java -Xmx512m -cp weka.jar <learner class> [options]

Example learner classes:Decision Tree: weka.classifiers.trees.J48Naive Bayes: weka.classifiers.bayes.NaiveBayesk-NN: weka.classifiers.lazy.IBk

Important generic options:-t <training file> Specify training file-T <test files> Specify Test file. If none testing is performed ontraining data-x <number of folds> Number of folds for cross-validation-l <input file> Use saved model-d <output file> Output model to file-split-percentage <train size> Size of training set-c <class index> Index of attribute to use as class (NB: the indexstart from 1)-p <attribute index> Only output the predictions and oneattribute (0 for none) for all test instances.

Page 64: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Running Learning Schemes

java -Xmx512m -cp weka.jar <learner class> [options]

Example learner classes:Decision Tree: weka.classifiers.trees.J48Naive Bayes: weka.classifiers.bayes.NaiveBayesk-NN: weka.classifiers.lazy.IBk

Important generic options:-t <training file> Specify training file-T <test files> Specify Test file. If none testing is performed ontraining data-x <number of folds> Number of folds for cross-validation-l <input file> Use saved model-d <output file> Output model to file-split-percentage <train size> Size of training set-c <class index> Index of attribute to use as class (NB: the indexstart from 1)-p <attribute index> Only output the predictions and oneattribute (0 for none) for all test instances.

Page 65: Decision tree algorithm short Weka tutorial - uniroma2.itart.uniroma2.it/basili/MLWM09/002_DecTree_Weka.pdf · Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili

Decision Tree WEKA

Running Learning Schemes

java -Xmx512m -cp weka.jar <learner class> [options]

Example learner classes:Decision Tree: weka.classifiers.trees.J48Naive Bayes: weka.classifiers.bayes.NaiveBayesk-NN: weka.classifiers.lazy.IBk

Important generic options:-t <training file> Specify training file-T <test files> Specify Test file. If none testing is performed ontraining data-x <number of folds> Number of folds for cross-validation-l <input file> Use saved model-d <output file> Output model to file-split-percentage <train size> Size of training set-c <class index> Index of attribute to use as class (NB: the indexstart from 1)-p <attribute index> Only output the predictions and oneattribute (0 for none) for all test instances.