id3 algorithm

33
ID3 Algorithm CS 157B: Spring 2010 Meg Genoar

Upload: rufin

Post on 23-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

ID3 Algorithm. CS 157B: Spring 2010 Meg Genoar. Iterative Dichotomiser 3. Ross Quinlan – 1987 C4.5 Precursor Decision Tree Generation. Ross Quinlan. Computer Scientist – UW 1968 Data Mining & Decision Theory AI: Data Mining ID3, C4.5, & C5.0 RuleQuest Research. ID3 & Entropy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ID3 Algorithm

ID3 Algorithm

CS 157B: Spring 2010Meg Genoar

Page 2: ID3 Algorithm

Iterative Dichotomiser 3Ross Quinlan – 1987C4.5 PrecursorDecision Tree Generation

Page 3: ID3 Algorithm

Ross QuinlanComputer Scientist – UW 1968Data Mining & Decision TheoryAI: Data MiningID3, C4.5, & C5.0RuleQuest Research

Page 4: ID3 Algorithm

Max-Gain Split

Most Useful Attribute

Highest Information

Best Attribute

Measure of Uncertainty

Randomness

Efficient Separation of Decision Tree Elements

ID3 & Entropy

Page 5: ID3 Algorithm

Entropy

Entropy(S) = – Ppositive Log2Ppositive

– Pnegative Log2Pnegative

Ppositive: proportion of positive data

Pnegative: proportion of negative data

Page 6: ID3 Algorithm

Example…

A collection S consists of 20 data examples:

13 Yes : 7 NoEntropy(S) = – (13/20) Log2(13/20)

– (7/20) Log2(7/20)

Entropy(S) = 0.934

Page 7: ID3 Algorithm

Entropy Gain Value

Gain: Place to Split the TreeHigh Gain > Low GainHigh Gain: Top of the TreeGain(A) = E(Current Set) - ∑ E(All Child Sets)

Page 8: ID3 Algorithm

Movie ExampleFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False3 United States Yes Comedy True4 Europe No Comedy True5 Europe Yes Science

FictionFalse

6 Europe Yes Romance False7 Rest of World Yes Comedy False8 Rest of World No Science

FictionFalse

9 Europe Yes Comedy True10 United States Yes Comedy True

Page 9: ID3 Algorithm

Entropy of TableIs the Film a Success?

Entropy(5 Yes, 5 No) = – (5/10) Log2(5/10)

– (5/10) Log2(5/10)

Entropy(Success) = 1

Page 10: ID3 Algorithm

Split – Country of Origin

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False3 United States Yes Comedy True4 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Europe No Comedy True2 Europe Yes Science

FictionFalse

3 Europe Yes Romance False4 Europe Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Rest of World Yes Comedy False2 Rest of World No Science

FictionFalse

Page 11: ID3 Algorithm

Gain – Country of OriginWhere is the film from?

Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)

Entropy(USA) = 0.811

Entropy(Europe) = – (2/4) Log2(2/4) – (2/4) Log2(2/4)

Entropy(Europe) = 1

Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2)

Entropy(Rest of World) = 0

Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276

Page 12: ID3 Algorithm

Split – Big StarFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True3 Europe Yes Science

FictionFalse

4 Europe Yes Romance False5 Rest of World Yes Comedy False6 Europe Yes Comedy True7 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False2 Europe No Comedy True3 Rest of World No Science

FictionFalse

Page 13: ID3 Algorithm

Gain – Big StarIs there a Big Star in the film?

Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7)

Entropy(Yes) = 0.985

Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(No) = 0.918

Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351

Page 14: ID3 Algorithm

Split – GenreFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 Europe Yes Science Fiction

False

3 Rest of World No Science Fiction

FalseFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False2 United States Yes Comedy True3 Europe No Comedy True4 Rest of World Yes Comedy False5 Europe Yes Comedy True6 United States Yes Comedy True

Film

Country of Origin

Big Star Genre Success

1 Europe Yes Romance False

Page 15: ID3 Algorithm

Gain – GenreWhat genre is the film?

Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(SciFi) = 0.918

Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6)

Entropy(Com) = 0.918

Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(Rom) = 0

Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738

Page 16: ID3 Algorithm

Compare Gains…Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

Page 17: ID3 Algorithm

Compare Gains…Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

First Split: Origin

Page 18: ID3 Algorithm

All MoviesUnited States Europe Rest of

World

New Table New Table New Table

Page 19: ID3 Algorithm

All MoviesUnited States Europe Rest of

World

New Table New Table New Table

Page 20: ID3 Algorithm

New Table – United States

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False3 United States Yes Comedy True4 United States Yes Comedy True

Entropy(3 Yes, 1 No) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)

Entropy(Success) = 0.811

Page 21: ID3 Algorithm

Split – Big Star

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True3 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

Page 22: ID3 Algorithm

Gain – Big StarIs there a Big Star in the film?

Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3)

Entropy(Yes) = 0

Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(No) = 0

Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811

Page 23: ID3 Algorithm

Split – Genre

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False2 United States Yes Comedy True3 United States Yes Comedy True

Page 24: ID3 Algorithm

Gain – GenreWhat genre is the film?

Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1)

Entropy(SciFi) = 0

Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3)

Entropy(Com) = 0.918

Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225

Page 25: ID3 Algorithm

Compare Gains…Gain(Star) = 0.811

Gain(Genre) = 0.1225

Page 26: ID3 Algorithm

Compare Gains…Gain(Star) = 0.811

Gain(Genre) = 0.1225

Split: Star

Page 27: ID3 Algorithm

All MoviesUnited States Europe Rest of

World

Star No Star

New Table New Table New Table

New Table New Table

Page 28: ID3 Algorithm

All MoviesUnited States Europe Rest of

World

Star No Star

Sci-Fi Comedy

New Table New Table New Table

New Table Failure

Success Success

Page 29: ID3 Algorithm

All MoviesUnited States

Europe

Rest of World

Table

Star No Star

Sci-Fi

ComedyNew Failur

eSucces

sSuccess

Star No Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Page 30: ID3 Algorithm

All MoviesUnited States

Europe

Rest of World

Table

Star No Star

Sci-Fi

ComedyNew Failur

eSucces

sSuccess

Star No Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 31: ID3 Algorithm

All MoviesUnited States

Europe

Rest of World

Table

Star No Star

Sci-Fi

ComedyNew Failur

eSucces

sSuccess

Star No Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 32: ID3 Algorithm

All MoviesUnited States

Europe

Rest of World

Table

Star No Star

Sci-Fi

ComedyNew Failur

eSucces

sSuccess

Star No Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 33: ID3 Algorithm

All MoviesUnited States

Europe

Rest of World

Table

Star No Star

Sci-Fi

ComedyNew Failur

eSucces

sSuccess

Star No Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…