a_fresh_look_final_10042014
TRANSCRIPT
TRACK1: ADVERTISING EFFECTIVENESSCASE STUDY: BASEBALL STADIUMS
A FRESH LOOK AT THE EFFECT OF PROMOTIONS ON BASEBALL ATTENDANCE USING HIERARCHICAL BAYESIAN ANALYSIS
Viswanath Srikanth
Tyler Deutsch
October 7, 2014
3:55-4:15pm
Back Bay Room
Seaport World Trade Center
Boston, MA
2 | CONFIDENTIAL
Who We Are
Tyler Deutsch
Background:
Tyler Deutsch is a Consultant with Slalom Consulting and has over five
years of consulting experience helping Fortune 500 clients in the financial
services, retail, and healthcare industries in the application of analytics.
Tyler is completing a Master of Science degree in Predictive Analytics from
Northwestern University. He is a member of the American Statistical
Association is based in Chicago, Illinois.
INTRODUCTIONS
Viswanath Srikanth
Background:
Sri (Viswanath Srikanth) is a Program Manager - Strategy and Analytics at
Cisco and a graduate student of Predictive Analytics at Northwestern
University. Sri was previously chair of the W3C Customer Experience Data
Layer community group, standardizing user behavior data collection for the
industry. He is based out of Chapel Hill, North Carolina.
Other Contributors Include: Alex Krawchick and Greg Wolford
3 | CONFIDENTIAL
Overview – Why Are We Here
• Major League Baseball- “America’s Favorite Pastime,” annual revenues of $8 billion in 2013
(Forbes)
• Ticket sales are a major source of revenue
• Baseball has high-supply of games - many seats go unfilled
• Competes for entertainment dollars
• How can attendance be increased?
BACKGROUND
4 | CONFIDENTIAL
Baseball Business Analytics Managers Want to Know…
BACKGROUND
No Promotion With Promotion
Att
en
dan
ce
1. Do promotions lead to an
increase in attendance?
2. If so, by how much?
?%
Let’s build a model!
5 | CONFIDENTIAL
Dataset and List of Variables
DATA OVERVIEW AND EDA
Promotion
VariablesDefinition
BobbleheadBinary indicator if a bobblehead
promotion was used that game
HeadgearBinary indicator if a headgear promotion
was used that game
ShirtsBinary indicator if a shirts promotion was
used that game
FireworksBinary indicator if a fireworks promotion
was used that game
Other
VariablesDefinition
Temperature Raw temperature (in Fahrenheit degrees)
Time of
GameDay or night game
Weather Weather as clear, cloudy, dome, or rainy
Day of Week Day of week of the game
Month Month the game was played
Interleague*
Binary indicator if the game was
interleague (e.g. 1 if American league vs.
American league team)
Intra-
Division*
Binary indicator if the game was intra-
division (e.g. 1 if Boston Red Sox vs.
New York Yankees)
Playoffs-2011*Binary indicator if the team made the
playoffs in the previous year
Average
Ticket Price*
The home team’s average ticket prices
for the 2012 season
Data collected by Erica Costello of Northwestern University
Sources: mlb.com, baseball-reference.com
*Additional data was collected by Sri, Tyler, Alex, and Greg
Dependent
VariableDefinition
Attendance Number of attendees per game
Dataset
From April- October 2012
2,421 home games across 30 teams (~81 home games per
team)
9 | CONFIDENTIAL
Exploratory Pathways with Frequentist and Bayesian Models
Linear Model :
Individual Team Data
Mixed Effects Model :
Cross-Sectional Data
Bayesian Linear Model :
Individual Team Data
Hierarchical Bayesian
Model :
Cross-Sectional Data
Frequentist Exploration Bayesian Exploration
Linear Model :
Pooled Data
Fixed & Random Effects
Model :
Cross-Sectional Data
Bayesian Linear Model :
Pooled Data
Data is cross sectional so a
model is needed that
incorporates variances
across teams
Frequentist – Mixed
Effects model that
incorporates both Random
and Fixed Effects is likely to
be useful.
Bayesian – Hierarchical
Bayesian is likely to deliver
the most valuable results
MODEL DEVELOPMENT
10 | CONFIDENTIAL
Hierarchical Bayesian – Great for Cross Sectional and Panel Data
MODEL DEVELOPMENT
Hierarchical approach estimates two levels of models The first estimation is for “within respondent” variation
The second estimation is for “across respondent” variation
Hierarchical approach is particularly useful if there are limited observations for an individual respondent (or team), but more data across multiple respondents (teams)
Markov Chain Monte Carlo (MCMC) chains have allowed computation of Hierarchical Bayesian estimates to happen in a reasonable length of time
Att
en
dan
ce
Vari
ati
on
Per
Team
Baseball Attendance
Data is a form of
Cross-Sectional Data
1. Draw parameter β given the data {yt, xt} and the
most recent draw of std.dev.σ2
2. Draw σ2 given the data {yt, xt} and the most
recent draw of β3. Repeat
11 | CONFIDENTIAL
Realizing Hierarchical Bayesian Model using HBLinear Function in R bayesm Package
raw.df <- read.table("bobbleheads_v003.csv",
header=TRUE, sep = ",")
teams ← levels(raw.df$Team)
nreg ← length(teams);nreg
…….
regdata← NULL
for(i in 1:nreg){
filter ← raw.df$Team == teams[i]
y ← raw.df$Attend[filter]
X ← cbind(1,
raw.df$TAverage[filter],
raw.df$THot[filter],
raw.df$Night[filter],
………
)
regdata[[i]] ← list(y=y, X=X)
}
Data ← list(regdata=regdata)
Mcmc ← list(R=2000000, keep=10, s=s, sdelta=sdelta)
run.1= rhierLinearModel (Data=Data, Mcmc=Mcmc)
• Modeling for attendance and variables
across all teams at the same time
• 2 million iterations used to ensure “burn-
in” settlement
• Default priors specified – implying the
patterns were entirely observation driven
R Code Snippet for HB Linear
MCMC “Burn-in”
MODEL DEVELOPMENT
12 | CONFIDENTIAL
Average Attendance Impacts Effectiveness of Promotions
RESULTS
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
Boston Red Sox New York
Yankees
Chicago Cubs Pittsburg Pirates Seattle Mariners
Avera
ge A
tten
dan
ce
Average Attendance Promotion Effect
1%5% 3%
21%
17%
Stadium Capacity
13 | CONFIDENTIAL
Bayesian Model Predicts Headgear and Bobbleheads as Leading Promotions
• Seattle Mariners are a small-market
team, with middling success
• No major revenue from TV, stadium is
large, but half-empty
• Promotions likely to be most valuable
to teams like Seattle Mariners
Credible Interval
Density Distribution for Parameters Seattle Mariners as an Example
Predicted Impact of Promotions on Attendance for the Seattle Mariners
RESULTS
Promotion Mean 95% Credible Interval
Bobbleheads 2,946 2,379-3,543
Headgear 3,596 2,279-4,932
Shirts 2,085 1,656-2,573
Fireworks 1,531 1,041-2,069
14 | CONFIDENTIAL
Comparing Hierarchical Bayesian Vs. Frequentist
RESULTS
Promotion Mean 95% Credible
Interval
Mean Estimate
Bobbleheads 2,946 2,379-3,543 1,662
Headgear 3,596 2,279-4,932 2,702
Shirts 2,085 1,656-2,573 1,910
Fireworks 1,531 1,041-2,069 Dropped owing to
computational time
Hierarchical Bayesian Mixed Effects - Frequentist
• The promotions were significant under both approaches
• Both approaches present similar results for Headgear and Shirts, but Mixed Effects shows
a smaller lift for Bobbleheads
• In this case, we defaulted the priors with Hierarchical Bayesian – if known experience of
baseball owners were presented as priors, HB’s predictive power would further improve
Results for Seattle Mariners
15 | CONFIDENTIAL
Key Takeaways
CONCLUSION
Promotions are Effective
• Most effective for mid market teams
• Bobbleheads and Headgear have highest effect on attendance
• Cost/Benefit profitability analysis should be next step
Cross Sectional or Panel Data? Consider Hierarchical Bayesian
16 | CONFIDENTIAL
Questions?
THANK YOU
Contact:
Viswanath Srikanth – [email protected]
Tyler Deutsch - [email protected]
18 | CONFIDENTIAL
Frequentist vs. Bayesian – What’s the Difference?
MODEL DEVELOPMENT
1. Prior information is
important and should
be used.
2. Interpretation: An
interval that has a 90%
probability of containing
the true parameter.
3. Calculation intensive,
demands strong
computing power
Bayesian
1. No information prior
to the model. “Purist”
approach.
2. Interpretation: A
collection of intervals
with 90% of them
containing true
parameter.
3. Relatively slow
computer performance
led to frequentist
dominating the scene
until the 1990s.
Frequentist
VS.
xkcd.com