topics in business intelligence k-nn & naive bayes – group 1 isabel van der lijke nathan bok...

TOPICS IN BUSINESS INTELLIGENCEK-NN & Naive Bayes – GROUP 1

Isabel van der LijkeNathan BokGökhan Korkmaz

INTRODUCTION K-NN

k-NN Classifier (Categorical Outcome) Determining Neighbors Classification Rule Example: Riding Mowers Choosing k Setting the Cutoff Value Advantages and shortcomings of k-NN algorithms

INTRODUCTION NAIVE BAYES

Basic Classification Procedure Cutoff Probability Method Conditional Probability Naive Bayes Advantages and shortcomings of the naive Bayes

classifier

SIMPLE CASE APPLICATION

Depression

SIMPLE CASE APPLICATION

Fruits

Example: P(Banana) = 500 / 1000 = 0,5

1-0,5 = 0,5 (Not banana)

New fruit compute all the chances

Sweet Not sweet

Banana 350 150 500Orange 150 150 300Other fruit 150 50 200Total 650 350 1000

REAL-LIFE APPLICATION NAIVE BAYES

Medical Data Classification with Naive Bayes Approach Introduction Requirements for systems dealing with medical data An empirical comparison Tables Conclusion

TABLE 2:COMPARATIVE ANALYSIS BASED ON PREDICTIVE ACCURACY

TABLE 3:COMPARATIVE ANALYSIS BASED ON AREA UNDER ROC CURVE (AUC)

REAL-LIFE APPLICATION K-NN

Used to help health care professionals in diagnosing heart disease.

Useful for pattern recognition and classification. Euclidean distance:

Often normalized data due to different variable formats.

CASE STUDY

“Our customer is a Dutch charity organization that wants to be able to classify it's supporters to donators and non-donators. The non-donators are sent a single marketing mail a year, whereas the donators receive multiple ones (up to 4).”

Who are the donators? Who are the non-donators?

Application of K-NN & Naive Bayes to training and test dataset. 4000 customers. SPSS, Excel, XLMiner

CLEAN-UP

No missing values 1-dimensional outliers removed through sorting

(regarding annual & average donation) 2-dimensional outliers removed through scatterplot

Variables Kept

Average donation

Frequency of Response

Median Time of Response

Time as client

Variables removedAnnual donationLast donationTime since last response.

Normalization of scores into z-scores. Nominal categorization of data Classification through percentiles of z-score & by

manually processing values within the variables.

ANALYSIS OF CASE STUDY – K-NN

Validation Data Scoring - Summary Report (for k = 13)

Error ReportClass # Cases # Errors % Error

0 1083 180 16,62049861

1 536 260 48,50746269

Overall 1619 440 27,17726992

Classification Confusion Matrix Predicted Class

Actual Class 0 1

0 903 180

1 260 276

CHOOSING MODEL FOR K-NN

Accuracy: Proportion of correctly classified instances. Error rate: (1 – Accuracy) Sensitivity: Sensitivity is the proportion of actual

positives which are correctly identified as positives by the classifier.

Specificity: Like sensitivity, but for the negatives.

M1 M2Selecting everyone in validation data

€711.20 €662.80

Selecting while correcting for sensitivity and specificity

€583.60 €530.80

APPLICATION OF MODEL ON TEST DATA

Classification Confusion Matrix

Predicted Class

Actual Class 0 1

0 2300 344

1 654 750

0 2644 344 13,01059

1 1404 654 46,5812

Overall 4048 998 24,65415

ANALYSIS OF THE CASE STUDY – NAIVE BAYES

Predicted Class

Actual Class 0 1

0 856 229

1 225 309

0 1085 229 21,10599

1 534 225 42,13483

Overall 1619 454 28,042

M1 = Cfrqres & Cavgdon M2 = Cfrqresp, Cavgdon

& Cmedtor

Classes -->

InputVariables

Value Prob Value Prob

CFRQRES

1 0,71977 1 0,2974832 0,171465 2 0,2551493 0,06398 3 0,192224 0,044786 4 0,255149

CAVGDON

1 0,632758 1 0,2974832 0,272553 2 0,4713963 0,076136 3 0,164764 0,018554 4 0,066362

Model 1 Model 2

Selecting everyone €1072 €1006

Selecting by class €2460,82 €2378.01

APPLICATION OF MODEL ON TEST DATA

Classes -->

InputVariables

Value Prob Value Prob

CFRQRES

1 0,714502 1 0,3061082 0,173338 2 0,2542613 0,066465 3 0,186084 0,045695 4 0,253551

CAVGDON

1 0,630287 1 0,31252 0,280589 2 0,4616483 0,068353 3 0,1683244 0,02077 4 0,057528

Predicted

Actual Class 0 1

0 2096 548

1 570 834

0 2644 548 20,72617

1 1404 570 40,59829

Overall 4048 1118 27,61858

LOOKING AT BOTH MODELS

QUESTIONS?

topics in business intelligence k-nn & naive bayes – group 1 isabel van der lijke nathan bok...

Documents

the impact of population growth on co2 emissions: an...

mustafa gökhan günay-150609027

nevac · 2018. 11. 27. · • bentu verantwoordelijkvoor...

university of groningen 'alles is mystiek'. literaire...

İlker˚gökhan˚Şen sovereignty referendums in...

hÜsem korkmaz - thesis.bilkent.edu.tr

1 turgay korkmaz office: sb 4.01.13 phone: (210) 458-7346...

gökhan keskİn mcitp / mct gokhan.keskin@bilisimegitim

a decision model proposal for credit risk rating of...

x-ray diffraction by fatma defne kocaayan buket sinem...

nazife korkmaz sena demirbağ m. selda tözüm sennur alay

kerim korkmaz a. tolga kilinÇ h. Özgür batur berkan...

portfolio of burak korkmaz

1 ceng 701 - tracking gökhan tekkaya gürkan vural can...

1 modular programming with functions turgay korkmaz office:...

1 turgay korkmaz office: npb 3.330 phone: (210) 458-7346...

1 programming with pointers turgay korkmaz office: sb...

anahtar kelimeler: the usage and meaning variety of the...

issue date: 24.12.2015 terminal representative gÖkhan...

communication ecem Çamliyer gÜnsu gÖkhan haslet gemİcİ...