cs 380: artificial intelligence decision tree learningsanti/teaching/2013/cs380/... ·...

48
CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNING 11/20/2013 Santiago Ontañón [email protected] https://www.cs.drexel.edu/~santi/teaching/2013/CS380/intro.html

Upload: others

Post on 22-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNING

11/20/2013 Santiago Ontañón [email protected] https://www.cs.drexel.edu/~santi/teaching/2013/CS380/intro.html

Page 2: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Machine Learning Summary: • Several types of learning:

•  Learning from examples: •  Supervised Learning •  Unsupervised learning

•  Reinforcement Learning •  Learning from Demonstration (imitation) •  Etc.

•  Today: •  Learning decision trees

Page 3: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Inductive Learning From Examples •  f: unknown function that we want to learn

•  f: X à Y where “X” is the input space, and “Y” is the target space

Page 4: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Inductive Learning From Examples •  f: unknown function that we want to learn

•  f: X à Y where “X” is the input space, and “Y” is the target space

For example: If we want to use machine learning to learn the evaluation function for “Othello”: -  X: space of Othello boards -  Y: real number

If we want to use machine learning to learn how to read hand-written characters: -  X: 16x16 pixel images -  Y: characters

Page 5: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Inductive Learning From Examples •  f: unknown function that we want to learn

•  f: X à Y where “X” is the input space, and “Y” is the target space

•  Training set: •  Set of examples from which to learn: e = (x1, f(x1))

Page 6: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Inductive Learning From Examples •  f: unknown function that we want to learn

•  f: X à Y where “X” is the input space, and “Y” is the target space

•  Training set: •  Set of examples from which to learn: e = (x1, f(x1))

For example: If we want to use machine learning to learn the evaluation function for “Othello”: e1 = (board1,+15) e2 = (board2, -5) …

Page 7: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Inductive Learning From Examples •  f: unknown function that we want to learn

•  f: X à Y where “X” is the input space, and “Y” is the target space

•  Training set: •  Set of examples from which to learn: e = (x1, f(x1))

•  Learning algorithm: •  Method that given the training set, generated a hypothesis h that

fits the data •  Different learning algorithms explore different hypothesis spaces:

•  Hypothesis space: set of all possible hypotheses that can be formulated •  Learning algorithm explores this search space looking for the simplest

hypothesis that fits the data

Page 8: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Induction of Decision Trees • One of the earliest forms of machine learning

• Algorithm: ID3: •  Hypothesis space: decision trees •  Example representation: feature vectors •  Explores the space of decision trees, trying to find one that fits the

data

Page 9: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Decision Tree Example •  Target function: “is it a good day to play tennis?”

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Page 10: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Training Set

Page 11: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Training Set

f([sunny,hot,high,weak]) = no f(sunny,hot,high,strong]) = no

etc.

Page 12: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Learning Decision Trees • Generating a hypothesis from examples:

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Page 13: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Page 14: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind]

Page 15: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Page 16: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Page 17: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Page 18: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook

Page 19: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Page 20: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Page 21: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Page 22: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Page 23: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Page 24: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity

Page 25: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity High Normal

Page 26: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity High Normal

Page 27: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity High Normal

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, Wind] Tree:

Page 28: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity High Normal

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, Wind] Tree:

Page 29: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity High Normal

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, Wind] Tree:

No

Page 30: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity Sunny Overcast Rainy

No

Page 31: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity

No

Humidity = High

High Normal

Page 32: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Examples = [ … ] Attributes_left = [day, temperature, humidity, Wind] Tree:

Humidity

No

Humidity = High High Normal

Yes

Humidity = Normal

Page 33: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Humidity

Page 34: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Algorithm ID3(examples, attributes_left)

Tree = new Node() If all examples have the same target value vt

Tree.target = vt Return Tree

If attributes_left = empty list Tree.target = most common target value in examples Return Tree

Tree.attribute = A = best attribute in attributes_left For each possible value v of A

If “examples where A = v” is empty SubTree = leaf node with most common value in examples Else SubTree = ID3(examples where A = v, attributes_left – A) add SubTree to Tree in a branch labeled “A = v”

Return Tree

Examples = [ … ] Attributes_left = [day, outlook, temperature, humidity, Wind] Tree:

Outlook Sunny Overcast Rainy

Humidity

Outlook = Sunny

Page 35: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Output

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Page 36: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Output

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

This tree can now be used to predict the target value for examples that were not in the original training set: generalization

Page 37: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Which Attribute is Best? •  In the original training set, we have:

•  9 examples with “YES” •  5 examples with “NO”

•  If we start with “outlook” If we start with “wind”:

9 yes 5 no

2 yes 3 no 4 yes 3 yes

2 no

9 yes 5 no

3 yes 3 no

6 yes 2 no

sunny overcast rainy strong mild

Page 38: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Which Attribute is Best? •  In the original training set, we have:

•  9 examples with “YES” •  5 examples with “NO”

•  If we start with “outlook” If we start with “wind”:

9 yes 5 no

2 yes 3 no 4 yes 3 yes

2 no

9 yes 5 no

3 yes 3 no

6 yes 2 no

sunny overcast rainy strong mild

We want examples to be classified as well as possible. Ideally, all “yes”

examples in one branch, and all “no” examples in another branch.

Page 39: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Entropy • Given a set of symbols S, drawn from an alphabet B

• Entropy: expected number of bits needed to encode the next symbol (amount of information that knowing one more symbol provides us)

•  If S are drawn at random from B, entropy is maximal •  If S is always the same symbol from B, entropy is minimal

(we know which symbol will come next, so, no new information)

Page 40: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Entropy

• Example for binary variable:

Entr

opy(S

)

1.0

0.5

0.0 0.5 1.0

p+

H(X) = �X

i

p(xi)log p(Xi)

Page 41: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Entropy for Attribute Selection •  The entropy in a node of the tree determines how well

“grouped” are the examples in that node:

9 yes 5 no

2 yes 3 no 4 yes 3 yes

2 no

sunny overcast rainy

Outlook

H = 0.94

H = 0.98 H = 0.00 H = 0.98

9 yes 5 no

3 yes 3 no

6 yes 2 no

strong mild

Wind

H = 0.94

H = 1.00 H = 0.81

Page 42: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Entropy for Attribute Selection •  Information Gain: reduction gain in Entropy due to

selecting attribute A

•  Idea: the best attribute is the one that maximizes information gain

Gain(S,A) = H(S)�X

v2values(A)

|Sv||S| H(Sv)

Page 43: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Information Gain

9 yes 5 no

2 yes 3 no 4 yes 3 yes

2 no

sunny overcast rainy

Outlook

H = 0.94

H = 0.98 H = 0.00 H = 0.98

9 yes 5 no

3 yes 3 no

6 yes 2 no

strong mild

Wind

H = 0.94

H = 1.00 H = 0.81

Gain(outlook) = 0.94 – 5/14 * (0.98) – 4/14 * 0 – 4/14 * 0.98 = 0.24 Gain(Wind) = 0.94 – 6/14 * (1.00) – 8/14 * (0.81) = 0.05

Page 44: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 Does Best-First Search

...

+ + +

A1

+ – + –

A2

A3

+

...

+ – + –

A2

A4

+ – + –

A2

+ – +

... ...

Search in the hypothesis space, starting from a tree with a single node, and using Information Gain as the heuristic function. We can conceive better search strategies (e.g. A*), but the computational cost might be too large (although it might learn much better!)

Page 45: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 • Better heuristics than “Information Gain” exits:

•  E.g. Gain Ratio, GINI index, RLDM distance, etc.

• Many alternative search strategies (e.g. decision forests)

• Over-fitting: •  What happens if we have an example in the training set that is

noise? (e.g. that is mislabeled) •  ID3 will try to force it into the decision tree! •  Over-fitting strategies exist to avoid this (e.g. prevent leaves that

have only a very small number of examples)

Page 46: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 • Converting a tree to rules:

•  Each branch of the tree is a rule •  A decision tree is just a compact way to represent a set of rules

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Outlook = rain and Wind = strong è playtennis = no

Page 47: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

ID3 • Converting a tree to rules:

•  Each branch of the tree is a rule •  A decision tree is just a compact way to represent a set of rules

Outlook

Overcast

Humidity

NormalHigh

No Yes

Wind

Strong Weak

No Yes

Yes

RainSunny

Outlook = rain and Wind = strong è playtennis = no

Thus, ID3 can be used to extract knowledge (e.g. rules) from large databases of examples

Page 48: CS 380: ARTIFICIAL INTELLIGENCE DECISION TREE LEARNINGsanti/teaching/2013/CS380/... · 2013-11-20 · Machine Learning Summary: • Several types of learning: • Learning from examples:

Other Supervised ML Methods •  Lazy Methods

•  Instance-based Learning •  Case-Based Reasoning

• Bayesian Learning: •  Naïve Bayes •  Bayesian Networks

• Regression methods (when target function is numerical) • Neural Networks • Boosting • Bagging • Support Vector Machines •  etc.