data mining for business analytics - new york...

P. Adamopoulos New York University

Lecture 8: Prediction via Evidence Combination

Stern School of Business

New York University

Spring 2014

Data Mining for Business Analytics


Example: Targeting Online Consumers with Ads

• Advertising campaign for upscale hotel chain

• We have run a campaign in the past, selecting online consumers

randomly

• We want to run a campaign getting more bookings per dollar spent

on ad impressions


Example: Targeting Online Consumers with Ads

• Target variable: binary

• Whether the consumer booked room within one week after having seen

the advertisement

• Prediction: class probability estimation

• The probability that a consumer will book a room after seeing an ad

• Targeting: target some subset of the highest probability consumers,

as our budget allows

• Features: the set of content pieces that we have observed her to

have viewed


Combining Evidence Probabilistically

• What is the chance that in our training data we have seen a

consumer with exactly the same visiting patterns as a consumer we

will see in the future?

• We will consider the different pieces of evidence separately, and

then combine the evidence


Joint Probability and Independence

• Joint probability using conditional probability

𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝(𝐵|𝐴)

• Joint probability of independent events

𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝(𝐵)


Bayes’ Rule

𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝 𝐵 𝐴 = 𝑝 𝐵 × 𝑝(𝐴|𝐵)

This means:

𝑝 𝐵 𝐴 =𝑝 𝐴 𝐵 × 𝑝(𝐵)

𝑝(𝐴)


Bayes Rule for Classification

𝑝 𝐶 = 𝑐 𝑬 =𝑝 𝑬 𝐶 = 𝑐 × 𝑝(𝐶 = 𝑐)

𝑝(𝑬)

• 𝑝(𝐶 = 𝑐|𝑬) is the posterior probability

• The probability that the target variable C takes on the class of interest c

after taking the evidence E

• 𝑝(𝐶 = 𝑐) is the prior probability of the class

• The probability we would assign to the class before seeing any evidence

• 𝑝 𝑬 𝐶 = 𝑐 is the likelihood of seeing the evidence 𝑬 when the class

𝐶 = 𝑐

• 𝑝(𝑬) is the likelihood of the evidence


Bayes Rule for Classification

𝑝 𝑬 𝑐 = 𝑝 𝑒1 ∧ 𝑒2 ∧ … ∧ 𝑒𝑘 𝑐)

• Bayesian methods for data science deal with this issue by making

assumptions of probabilistic independence


Conditional Independence and Naïve Bayes

𝑝 𝑬 𝑐 = 𝑝 𝑒1 ∧ 𝑒2 ∧ … ∧ 𝑒𝑘 𝑐) = 𝑝 𝑒1 𝑐 × 𝑝 𝑒2 𝑐 × ⋯ × 𝑝(𝑒𝑘|𝑐)

𝑝 𝑐0 𝑬 = 𝑝 𝑒1 𝑐0 × 𝑝 𝑒2 𝑐0 × ⋯ × 𝑝(𝑒𝑘|𝑐0)

𝑝 𝑒1 𝑐0 × ⋯ × 𝑝 𝑒𝑘 𝑐0 + 𝑝 𝑒1 𝑐1 × ⋯ × 𝑝(𝑒𝑘|𝑐1)


Advantages and Disadvantages of Naïve Bayes

• Very simple classifier

• Efficient in terms of both storage space and computation time

• Performs well in many real-world applications

• Non-accurate class probability estimation

• Incremental learner

Evidence Lift


A Model of Evidence “Lift”

Assuming full feature independence:

𝑝 𝑐 𝑬 =𝑝 𝑒1 𝑐 × 𝑝 𝑒2 𝑐 × ⋯ × 𝑝 𝑒𝑘 𝑐 × 𝑝(𝑐)

𝑝 𝑒1 × 𝑝 𝑒2 × ⋯ × 𝑝(𝑒𝑘)

Then

𝑝 𝐶 = 𝑐 𝑬 = 𝑝 𝐶 = 𝑐 × lift𝑐 𝑒1 × lift𝑐 𝑒2 × ⋯

where lift𝑐 𝑥 is defined as:

lift𝑐 𝑥 =𝑝(𝑥|𝑐)

𝑝(𝑥)


Example: Evidence Lifts from Facebook “Likes”

What people “Like” on Facebook is quite predictive of:

• How they score on intelligence tests

• How they score on psychometric tests (e.g., how extroverted or

conscientious they are)

• Whether they drink alcohol or smoke

• Their religion and political views

• …


Example: Evidence Lifts from Facebook “Likes”


Thanks!


Questions?

data mining for business analytics - new york...

Documents