outline decision tree representation id3 learning algorithm entropy, information gain issues in...

Decision Tree

Decision TreeMohammad Ali KeyvanradMachine LearningIn the Name of GodThanks to: Tom Mitchell (Carnegie Mellon University )Rich Caruana (Cornell University)1393-1394 (Spring)OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning

2OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning

3Decision Tree for Play Tennis4

Decision Trees5internal node = attribute test

branch =attribute value

leaf node = classification

Decision tree representationIn general, decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances.Disjunction: orConjunctions: and6

Appropriate Problems For Decision Tree LearningInstances are represented by attribute-value pairsThe target function has discrete output valuesDisjunctive descriptions may be requiredThe training data may contain errorsThe training data may contain missing attribute values

ExamplesMedical diagnosis7OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning

8Top-Down Induction of Decision TreesMain loopfind best attribute test to install at rootsplit data on root testfind best attribute tests to install at each new nodesplit data on new testsrepeat until training examples perfectly classifiedWhich attribute is best?9

ID310

ID311

ID312

OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning

13Entropy14

Entropy15

Information Gain16

Training Examples17

Selecting the Next AttributeWhich Attribute is the best classifier?18

19

Hypothesis Space Search by ID3The hypothesis space searched by ID3 is the set of possible decision trees.ID3 performs a simple-to complex, hill-climbing search through this hypothesis space.20

OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning

21OverfittingID3 grows each branch of the tree just deeply enough to perfectly classify the training examples.DifficultiesNoise in the dataSmall dataConsider adding noisy training example #15Sunny, Hot, Normal, Strong, PlayTennis=NoEffect?Construct a more complex tree

22Overfitting23

Overfitting in Decision Tree Learning24

Avoiding overfitiing25Reduced-Error PruningSplit data into training and validation setDo until further pruning is harmful (decreases accuracy of the tree over the validation set)Evaluate impact on validation set of pruning each possible node (plus those below it)Greedily remove the one that most improves validation set accuracy

26Effect of Reduced-Error Pruning27

Rule Post-PruningEach attribute test along the path from the root to the leaf becomes a rule antecedent (precondition)

MethodConvert tree to equivalent set of rulesPrune each rule independently of otherseach such rule is pruned by removing any antecedent, whose removal does not worsen its estimated accuracySort final rules into desired sequence for use

Perhaps most frequently used method (e.g., C4.5)28Converting A Tree to Rules29

Rule Post-PruningMain advantages of convert the decision tree to rulesThe pruning decision regarding an attribute test can be made differently for each path.If the tree itself were pruned, the only two choices would be to remove the decision node completely, or to retain it in its original form.Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves.Converting to rules improves readability. Rules are often easier for to understand.30Continuous-Valued Attributes31

Unknown Attribute Values32

HumidityWindUnknown Attribute Values33

HumidityWindAttribute with Costs34

35

outline decision tree representation id3 learning algorithm entropy, information gain issues in...

Documents

decision tree learning24

set of possible decision

information gainissues

attributewhich attribute

classifiedwhich attribute

attribute test branch

attribute values of

requiredthe training