decision trees
DESCRIPTION
Some concepts on decision treesTRANSCRIPT
Decision Trees
What is a tree in CS?
• A tree is a non-linear data structure• It has a unique node called the root• Every non-trivial tree has one or more leaf
nodes, arranged in different levels• Trees are always drawn with the root at
the top or on the left• Nodes at a level are connected to nodes at
higher (parent) level or lower (child) level• There are no loops in a tree
Decision Trees
• A decision tree (DT) is a hierarchical classification and prediction model
• It is organized as a rooted tree with 2 types of nodes called decision nodes and class nodes
• It is a supervised data mining model used for classification or prediction
An Example Data Set and Decision Tree
# Class
Outlook Company Sailboat Sail?
1 sunny big small yes
2 sunny med small yes
3 sunny med big yes
4 sunny no small yes
5 sunny big big yes
6 rainy no small no
7 rainy med small yes
8 rainy big big yes
9 rainy no big no
10 rainy med big no
Attribute
yes
no
yes no
sunny rainy
nomed
yes
small big
big
outlook
company
sailboat
Classification
• What is classification?• What are some applications of Decision
Tree Classifiers (DTC)• What is a BDTC?• Misclassification errors
Classification
# Class
Outlook Company Sailboat Sail?
1 sunny no big ?
2 rainy big small ?
Attribute
yes
no
yes no
sunny rainy
nomed
yes
small big
big
outlook
company
sailboat
Chance and Terminal nodes
• Each internal node of a DT is a decision point, where some condition is tested
• The result of this condition determines which branch of the tree is to be taken next
• Thus they are called decision node, chance node or non-terminal node
• Chance nodes partition the available data at that point to maximize dependent variable differences
Terminal nodes
• The leaf nodes of a DT are called terminal node
• They indicate the class into which a data instance will be classified
• They have just one incoming node• They do not have child nodes (outgoing nodes)• There are no conditions tested at terminal
nodes• Tree traversal from the root to the leaf
produces the production rule for that class
Advantages of DT
• Easy to understand and interpret• Works for categorical and quantitative
data• DT can grow to any depth• Attributes can be chosen in any desired
order• Pruning a DT is very easy• Works for missing or null values
Advantages contd.
• Can be used to identify outliers• Production rules can be obtained directly
from the built DT• They are relatively faster than other
classification models• DT can be used even when domain
experts are absent
Disadvantages
• A DT induces sequential decisions• Class-overlap problem• Correlated data• Complex production rules• A DT can be sub-optimal
Quinlan’s classical example
# Class
Outlook Temperature Humidity Windy Play
1 sunny hot high no N
2 sunny hot high yes N
3 overcast hot high no P
4 rainy moderate high no P
5 rainy cold normal no P
6 rainy cold normal yes N
7 overcast cold normal yes P
8 sunny moderate high no N
9 sunny cold normal no P
10 rainy moderate normal no P
11 sunny moderate normal yes P
12 overcast moderate high yes P
13 overcast hot normal no P
14 rainy moderate high yes N
Attribute
Simple Tree
Outlook
Humidity WindyP
sunnyovercast
rainy
PN
high normal
PN
yes no
Complicated Tree
Temperature
Outlook Windy
cold
moderatehot
P
sunny rainy
N
yes no
P
overcast
Outlook
sunny rainy
P
overcast
Windy
PN
yes no
Windy
NP
yes no
Humidity
P
high normal
Windy
PN
yes no
Humidity
P
high
Outlook
N
sunny rainy
P
overcast
null
Production rules
• Rules abstracted by a DT can be converted into production rules
• These are obtained by traversing each branch of the DT from root to each of the leaves
• A DT can be reconstructed if all production rules are known
General View of DT Induction
ID3 induction algorithm
• ID3 (Interactive dichotomiser)• Introduced in 1986 by Quinlan• Uses greedy tree-growing method• Works on binary attributes• Uses entropy measure
C4.5 induction algorithm
• Invented by Quinlan in 1993• Is an extension of ID3 algorithm• Uses greedy tree-growing method• Works on general attributes• Uses entropy measure• Uses multi-way splits
CART induction algorithm
• Invented by Breiman, et.al. in 1984• Uses binary recursive partitioning method• Works on general attributes• Uses Gini measure• Uses two-way splits
Measures for node splitting
• Gini’s Index measure• Modified Gini Index• Normalized, symmetric and asymmetric
Gini Index measure• Shannon’s entropy measure• Minimum classification error measure• Chi-square statistic
Entropy
• The average amount of information I needed to classify an object is given by the entropy measure
• For a two-class problem:
Chi-squared Automatic Interaction Detector(CHAID)
• As the name implies, this is a statistical technique for tree induction that uses Karl Pearson's X2 test for contingency tables.
• It works for categorical variables (with 2 or more categories), and can be used as an alternative to logistic regression.
• There is no pruning step as it stops growing the DT when a certain condition is met.
Pruning DT
• Once the decision tree has been constructed, a sensitivity analysis should be performed to test the suitability of the model to variations in the data instances. Expected values of each alternative are evaluated to determine optimal model. But the decision maker's attitude towards high risk alternatives can negatively influence the outcome of a sensitivity analysis. Most of the decision tree software packages allows the user to carry out sensitivity analysis.
Pre Vs Post-pruning
• There are two approaches to prune a DT -- pre-pruning and post-pruning. In pre-pruning, the tree growing is halted when a stopping condition is met.
• Post-pruning works with a completely grown tree. In post-pruning, test cases are used to prune the DT to minimize the classification error or to adjust the tree to data changes.
• Tree pruning is usually a post-processing step with an intention to minimize over fitting, and to remove redundancies.
Decision Tables
• A decision table is a hierarchical structure akin to decision trees, except that data are enumerated into a table using a pair of attributes, rather than a single attribute.
• Quantitative variables should be categorized using the discretisation technique discussed in chapter 1.
Fraud Detection
• Fraud detection is increasingly becoming a necessity due to the large number of uncaught frauds. Fraudulent financial transaction amounts to billions of dollars every year throughout the world. Fraud prevention is different from fraud detection, as the former is pre-transaction safety, and the later is used during or immediately after a transaction.
Software for DT
• DTREG is a powerful statistical analysis program that generates classification and regression trees (www.dtreg.com)
• GATree (www.gatree.com)• Weka (University of Waikato, NZ)• TreeAge Pro (www.treeage.com)• YaDT (www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.1.zip)
THE END