data mining for business analytics - new york...
TRANSCRIPT
P. Adamopoulos New York University
Lecture 8: Prediction via Evidence Combination
Stern School of Business
New York University
Spring 2014
Data Mining for Business Analytics
P. Adamopoulos New York University
Example: Targeting Online Consumers with Ads
• Advertising campaign for upscale hotel chain
• We have run a campaign in the past, selecting online consumers
randomly
• We want to run a campaign getting more bookings per dollar spent
on ad impressions
P. Adamopoulos New York University
Example: Targeting Online Consumers with Ads
• Target variable: binary
• Whether the consumer booked room within one week after having seen
the advertisement
• Prediction: class probability estimation
• The probability that a consumer will book a room after seeing an ad
• Targeting: target some subset of the highest probability consumers,
as our budget allows
• Features: the set of content pieces that we have observed her to
have viewed
P. Adamopoulos New York University
Combining Evidence Probabilistically
• What is the chance that in our training data we have seen a
consumer with exactly the same visiting patterns as a consumer we
will see in the future?
• We will consider the different pieces of evidence separately, and
then combine the evidence
P. Adamopoulos New York University
Joint Probability and Independence
• Joint probability using conditional probability
𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝(𝐵|𝐴)
• Joint probability of independent events
𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝(𝐵)
P. Adamopoulos New York University
Bayes’ Rule
𝑝 𝐴𝐵 = 𝑝 𝐴 × 𝑝 𝐵 𝐴 = 𝑝 𝐵 × 𝑝(𝐴|𝐵)
This means:
𝑝 𝐵 𝐴 =𝑝 𝐴 𝐵 × 𝑝(𝐵)
𝑝(𝐴)
P. Adamopoulos New York University
Bayes Rule for Classification
𝑝 𝐶 = 𝑐 𝑬 =𝑝 𝑬 𝐶 = 𝑐 × 𝑝(𝐶 = 𝑐)
𝑝(𝑬)
• 𝑝(𝐶 = 𝑐|𝑬) is the posterior probability
• The probability that the target variable C takes on the class of interest c
after taking the evidence E
• 𝑝(𝐶 = 𝑐) is the prior probability of the class
• The probability we would assign to the class before seeing any evidence
• 𝑝 𝑬 𝐶 = 𝑐 is the likelihood of seeing the evidence 𝑬 when the class
𝐶 = 𝑐
• 𝑝(𝑬) is the likelihood of the evidence
P. Adamopoulos New York University
Bayes Rule for Classification
𝑝 𝑬 𝑐 = 𝑝 𝑒1 ∧ 𝑒2 ∧ … ∧ 𝑒𝑘 𝑐)
• Bayesian methods for data science deal with this issue by making
assumptions of probabilistic independence
P. Adamopoulos New York University
Conditional Independence and Naïve Bayes
𝑝 𝑬 𝑐 = 𝑝 𝑒1 ∧ 𝑒2 ∧ … ∧ 𝑒𝑘 𝑐) = 𝑝 𝑒1 𝑐 × 𝑝 𝑒2 𝑐 × ⋯ × 𝑝(𝑒𝑘|𝑐)
𝑝 𝑐0 𝑬 = 𝑝 𝑒1 𝑐0 × 𝑝 𝑒2 𝑐0 × ⋯ × 𝑝(𝑒𝑘|𝑐0)
𝑝 𝑒1 𝑐0 × ⋯ × 𝑝 𝑒𝑘 𝑐0 + 𝑝 𝑒1 𝑐1 × ⋯ × 𝑝(𝑒𝑘|𝑐1)
P. Adamopoulos New York University
Advantages and Disadvantages of Naïve Bayes
• Very simple classifier
• Efficient in terms of both storage space and computation time
• Performs well in many real-world applications
• Non-accurate class probability estimation
• Incremental learner
Evidence Lift
P. Adamopoulos New York University
A Model of Evidence “Lift”
Assuming full feature independence:
𝑝 𝑐 𝑬 =𝑝 𝑒1 𝑐 × 𝑝 𝑒2 𝑐 × ⋯ × 𝑝 𝑒𝑘 𝑐 × 𝑝(𝑐)
𝑝 𝑒1 × 𝑝 𝑒2 × ⋯ × 𝑝(𝑒𝑘)
Then
𝑝 𝐶 = 𝑐 𝑬 = 𝑝 𝐶 = 𝑐 × lift𝑐 𝑒1 × lift𝑐 𝑒2 × ⋯
where lift𝑐 𝑥 is defined as:
lift𝑐 𝑥 =𝑝(𝑥|𝑐)
𝑝(𝑥)
P. Adamopoulos New York University
Example: Evidence Lifts from Facebook “Likes”
What people “Like” on Facebook is quite predictive of:
• How they score on intelligence tests
• How they score on psychometric tests (e.g., how extroverted or
conscientious they are)
• Whether they drink alcohol or smoke
• Their religion and political views
• …
P. Adamopoulos New York University
Example: Evidence Lifts from Facebook “Likes”
P. Adamopoulos New York University
Thanks!
P. Adamopoulos New York University
Questions?