data mining for business analytics - new york...

17
P. Adamopoulos New York University Lecture 6: Decision Analytic Thinking Stern School of Business New York University Spring 2014 Data Mining for Business Analytics

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

P. Adamopoulos New York University

Lecture 6: Decision Analytic Thinking

Stern School of Business

New York University

Spring 2014

Data Mining for Business Analytics

P. Adamopoulos New York University

Evaluation

How do we measure generalization performance?

P. Adamopoulos New York University

Evaluating Classifiers: Plain Accuracy

Accuracy=Number of correct decisions made

Total number of decisions made

=1−error rate

• Too simplistic..

P. Adamopoulos New York University

Evaluating Classifiers: The Confusion Matrix

• A confusion matrix for a problem involving 𝑛 classes is an 𝑛 × 𝑛

matrix,

• with the columns labeled with actual classes and the rows labeled with

predicted classes

• It separates out the decisions made by the classifier,

• making explicit how one class is being confused for another

• The errors of the classifier are the false positives and false

negatives

P. Adamopoulos New York University

Default Truth

Model Prediction

0 0

1 1 0 1 0 1

0 0 1 1

0 0 0 0

1 1 1 0

Predicted class

Actual class Default No Default Total

Default 3 1 4

No Default 2 4 6

Total 5 5 10

Building a Confusion Matrix

P. Adamopoulos New York University

Other Evaluation Metrics

• Precision =𝑇𝑃

𝑇𝑃+𝐹𝑃

• Recall =𝑇𝑃

𝑇𝑃+𝐹𝑁

• F−measure = 2 ×precision×recallprecision+recall

Expected Value Framework

P. Adamopoulos New York University

A Key Analytical Framework: Expected Value

• The expected value computation provides a framework that is useful

in organizing thinking about data-analytic problems

• It decomposes data-analytic thinking into:

• the structure of the problem,

• the elements of the analysis that can be extracted from the data, and

• the elements of the analysis that need to be acquired from other sources

• The general form of an expected value calculation:

𝐸𝑉 = 𝑝 𝑜1 × 𝑣 𝑜1 + 𝑝 𝑜2 × 𝑣 𝑜2 + 𝑝 𝑜3 × 𝑣 𝑜3 +. .

P. Adamopoulos New York University

Expected Value Framework in Use Phase

Online marketing:

• Expected benefit of targeting = 𝑝𝑅 𝑥 × 𝑣𝑅 + 1 − 𝑝𝑅 𝑥 × 𝑣𝑁𝑅

• Product Price: $200

• Product Cost: $100

• Targeting Cost: $1

𝑝𝑅 𝒙 × $99 − 1 − 𝑝𝑅 𝒙 × $1 > 0 𝑝𝑅 𝒙 > 0.01

P. Adamopoulos New York University

Using Expected Value to Frame Classifier Evaluation

P. Adamopoulos New York University

A cost-benefit matrix

P. Adamopoulos New York University

A cost-benefit matrix for the marketing example

P. Adamopoulos New York University

Conditional Probability

A rule of basic probability is:

𝑝 𝑥, 𝑦 = 𝑝 𝑦 × 𝑝(𝑥 | 𝑦)

P. Adamopoulos New York University

Using Expected Value to Frame Classifier Evaluation

Expected profit = 𝑝 𝑌, 𝑝 × 𝑏 𝑌, 𝑝 + 𝑝 𝑁, 𝑝 × 𝑏 𝑁, 𝑝 + 𝑝 𝑁, 𝑛 × 𝑏 𝑁, 𝑛 + 𝑝 𝑌, 𝑛 × 𝑏 𝑌, 𝑛

Expected profit = 𝑝 𝑌 𝑝 × 𝑝 𝑝 × 𝑏 𝑌, 𝑝 + 𝑝 𝑁 𝑝 × 𝑝 𝑝 × 𝑏 𝑁, 𝑝 + 𝑝(𝑁|𝑛) × 𝑝(𝑛) × 𝑏(𝑁, 𝑛) + 𝑝(𝑌|𝑛) × 𝑝(𝑛) × 𝑏(𝑌, 𝑛)

Expected profit = 𝑝 𝑝 × 𝑝 𝑌 𝑝 × 𝑏 𝑌, 𝑝 + 𝑝 𝑁 𝑝 × 𝑏 𝑁, 𝑝 + 𝑝 𝑛 × [𝑝 𝑁 𝑛 × 𝑏 𝑁, 𝑛 + 𝑝 𝑌 𝑛 × 𝑏 𝑌, 𝑛 ]

P. Adamopoulos New York University

Using Expected Value to Frame Classifier Evaluation

Expected profit = 𝑝 𝒑 × 𝑝 𝒀 𝒑 × 𝑏 𝒀, 𝒑 + 𝑝 𝑵 𝒑 × 𝑏 𝑵, 𝒑 + 𝑝 𝒏 × 𝑝 𝑵 𝒏 × 𝑏 𝑵, 𝒏 + 𝑝 𝒀 𝒏 × 𝑏 𝒀, 𝒏

= 0.55 × 0.92 × 𝑏 𝒀, 𝒑 + 0.08 × 𝑏 𝑵, 𝒑 +0.45 × 0.86 × 𝑏 𝑵, 𝒏 + 0.14 × 𝑏 𝒀, 𝒏

= 0.55 × 0.92 × 99 + 0.08 × 0 +0.45 × 0.86 × 0 + 0.14 × −1

= 50.1 − 0.063 ≈ $𝟓𝟎. 𝟎𝟒

𝑇 = 110

𝑃 = 61 𝑁 = 49

𝑝(𝑝) = 0.55 𝑝(𝑛) = 0.45

𝑝(𝑌|𝑝) = 56/61 = 0.92 𝑝(𝑌|𝑛) = 7/49 = 0.14

𝑝(𝑁|𝑝) = 5/61 = 0.08 𝑝(𝑁|𝑛) = 42/49 = 0.86

P. Adamopoulos New York University

Thanks!

P. Adamopoulos New York University

Questions?