topics in business intelligence k-nn & naive bayes – group 1 isabel van der lijke nathan bok...

28
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

Upload: winfred-oconnor

Post on 29-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

TOPICS IN BUSINESS INTELLIGENCEK-NN & Naive Bayes – GROUP 1

Isabel van der LijkeNathan BokGökhan Korkmaz

Page 2: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

INTRODUCTION K-NN

k-NN Classifier (Categorical Outcome) Determining Neighbors Classification Rule Example: Riding Mowers Choosing k Setting the Cutoff Value Advantages and shortcomings of k-NN algorithms

2

Page 3: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

INTRODUCTION NAIVE BAYES

Basic Classification Procedure Cutoff Probability Method Conditional Probability Naive Bayes Advantages and shortcomings of the naive Bayes

classifier

3

Page 4: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

SIMPLE CASE APPLICATION

Depression

4

Page 5: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

SIMPLE CASE APPLICATION

Fruits

Example: P(Banana) = 500 / 1000 = 0,5

1-0,5 = 0,5 (Not banana)

New fruit compute all the chances

5

  Sweet Not sweet

Total

Banana 350 150 500Orange 150 150 300Other fruit 150 50 200Total 650 350 1000

Page 6: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

REAL-LIFE APPLICATION NAIVE BAYES

Medical Data Classification with Naive Bayes Approach Introduction Requirements for systems dealing with medical data An empirical comparison Tables Conclusion

6

Page 7: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

TABLE 2:COMPARATIVE ANALYSIS BASED ON PREDICTIVE ACCURACY

7

Page 8: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

TABLE 3:COMPARATIVE ANALYSIS BASED ON AREA UNDER ROC CURVE (AUC)

8

Page 9: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

REAL-LIFE APPLICATION K-NN

Used to help health care professionals in diagnosing heart disease.

Useful for pattern recognition and classification. Euclidean distance:

Often normalized data due to different variable formats.

9

Page 10: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

CASE STUDY

“Our customer is a Dutch charity organization that wants to be able to classify it's supporters to donators and non-donators. The non-donators are sent a single marketing mail a year, whereas the donators receive multiple ones (up to 4).”

Who are the donators? Who are the non-donators?

Application of K-NN & Naive Bayes to training and test dataset. 4000 customers. SPSS, Excel, XLMiner

10

Page 11: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

CLEAN-UP

No missing values 1-dimensional outliers removed through sorting

(regarding annual & average donation) 2-dimensional outliers removed through scatterplot

11

Page 12: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

12

Page 13: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

Variables Kept

Average donation

Frequency of Response

Median Time of Response

Time as client

Variables removedAnnual donationLast donationTime since last response.

13

Page 14: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

Normalization of scores into z-scores. Nominal categorization of data Classification through percentiles of z-score & by

manually processing values within the variables.

14

Page 15: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

ANALYSIS OF CASE STUDY – K-NN

15

Page 16: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

Validation Data Scoring - Summary Report (for k = 13)

16

Error ReportClass # Cases # Errors % Error

0 1083 180 16,62049861

1 536 260 48,50746269

Overall 1619 440 27,17726992

Classification Confusion Matrix  Predicted Class

Actual Class 0 1

0 903 180

1 260 276

Page 17: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

CHOOSING MODEL FOR K-NN

Accuracy: Proportion of correctly classified instances. Error rate: (1 – Accuracy) Sensitivity: Sensitivity is the proportion of actual

positives which are correctly identified as positives by the classifier.

Specificity: Like sensitivity, but for the negatives.

17

Page 18: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

18

Page 19: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

  M1 M2Selecting everyone in validation data

€711.20 €662.80

Selecting while correcting for sensitivity and specificity

€583.60 €530.80

19

Page 20: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

APPLICATION OF MODEL ON TEST DATA

Classification Confusion Matrix

  Predicted Class

Actual Class 0 1

0 2300 344

1 654 750

20

Error ReportClass # Cases # Errors % Error

0 2644 344 13,01059

1 1404 654 46,5812

Overall 4048 998 24,65415

Page 21: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

21

Page 22: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

ANALYSIS OF THE CASE STUDY – NAIVE BAYES

22

Classification Confusion Matrix

  Predicted Class

Actual Class 0 1

0 856 229

1 225 309

Error ReportClass # Cases # Errors % Error

0 1085 229 21,10599

1 534 225 42,13483

Overall 1619 454 28,042

M1 = Cfrqres & Cavgdon M2 = Cfrqresp, Cavgdon

& Cmedtor

Classes -->

InputVariables

0 1

Value Prob Value Prob

CFRQRES

1 0,71977 1 0,2974832 0,171465 2 0,2551493 0,06398 3 0,192224 0,044786 4 0,255149

CAVGDON

1 0,632758 1 0,2974832 0,272553 2 0,4713963 0,076136 3 0,164764 0,018554 4 0,066362

Page 23: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

23

  Model 1 Model 2

Selecting everyone €1072 €1006

Selecting by class €2460,82 €2378.01

Page 24: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

APPLICATION OF MODEL ON TEST DATA

Classes -->

InputVariables

0 1

Value Prob Value Prob

CFRQRES

1 0,714502 1 0,3061082 0,173338 2 0,2542613 0,066465 3 0,186084 0,045695 4 0,253551

CAVGDON

1 0,630287 1 0,31252 0,280589 2 0,4616483 0,068353 3 0,1683244 0,02077 4 0,057528

24

Page 25: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

25

Classification Confusion Matrix

 Predicted

Class

Actual Class 0 1

0 2096 548

1 570 834

Error ReportClass # Cases # Errors % Error

0 2644 548 20,72617

1 1404 570 40,59829

Overall 4048 1118 27,61858

Page 26: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

LOOKING AT BOTH MODELS

26

Page 27: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

27

Page 28: TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

QUESTIONS?

28