outline decision tree representation id3 learning algorithm entropy, information gain issues in...
TRANSCRIPT
Decision Tree
Decision TreeMohammad Ali KeyvanradMachine LearningIn the Name of GodThanks to: Tom Mitchell (Carnegie Mellon University )Rich Caruana (Cornell University)1393-1394 (Spring)OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning
2OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning
3Decision Tree for Play Tennis4
Decision Trees5internal node = attribute test
branch =attribute value
leaf node = classification
Decision tree representationIn general, decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances.Disjunction: orConjunctions: and6
Appropriate Problems For Decision Tree LearningInstances are represented by attribute-value pairsThe target function has discrete output valuesDisjunctive descriptions may be requiredThe training data may contain errorsThe training data may contain missing attribute values
ExamplesMedical diagnosis7OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning
8Top-Down Induction of Decision TreesMain loopfind best attribute test to install at rootsplit data on root testfind best attribute tests to install at each new nodesplit data on new testsrepeat until training examples perfectly classifiedWhich attribute is best?9
ID310
ID311
ID312
OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning
13Entropy14
Entropy15
Information Gain16
Training Examples17
Selecting the Next AttributeWhich Attribute is the best classifier?18
19
Hypothesis Space Search by ID3The hypothesis space searched by ID3 is the set of possible decision trees.ID3 performs a simple-to complex, hill-climbing search through this hypothesis space.20
OutlineDecision tree representationID3 learning algorithmEntropy, Information gainIssues in decision tree learning
21OverfittingID3 grows each branch of the tree just deeply enough to perfectly classify the training examples.DifficultiesNoise in the dataSmall dataConsider adding noisy training example #15Sunny, Hot, Normal, Strong, PlayTennis=NoEffect?Construct a more complex tree
22Overfitting23
Overfitting in Decision Tree Learning24
Avoiding overfitiing25Reduced-Error PruningSplit data into training and validation setDo until further pruning is harmful (decreases accuracy of the tree over the validation set)Evaluate impact on validation set of pruning each possible node (plus those below it)Greedily remove the one that most improves validation set accuracy
26Effect of Reduced-Error Pruning27
Rule Post-PruningEach attribute test along the path from the root to the leaf becomes a rule antecedent (precondition)
MethodConvert tree to equivalent set of rulesPrune each rule independently of otherseach such rule is pruned by removing any antecedent, whose removal does not worsen its estimated accuracySort final rules into desired sequence for use
Perhaps most frequently used method (e.g., C4.5)28Converting A Tree to Rules29
Rule Post-PruningMain advantages of convert the decision tree to rulesThe pruning decision regarding an attribute test can be made differently for each path.If the tree itself were pruned, the only two choices would be to remove the decision node completely, or to retain it in its original form.Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves.Converting to rules improves readability. Rules are often easier for to understand.30Continuous-Valued Attributes31
Unknown Attribute Values32
HumidityWindUnknown Attribute Values33
HumidityWindAttribute with Costs34
35