a_fresh_look_final_10042014

18
TRACK1: ADVERTISING EFFECTIVENESS CASE STUDY: BASEBALL STADIUMS A FRESH LOOK AT THE EFFECT OF PROMOTIONS ON BASEBALL ATTENDANCE USING HIERARCHICAL BAYESIAN ANALYSIS Viswanath Srikanth Tyler Deutsch October 7, 2014 3:55-4:15pm Back Bay Room Seaport World Trade Center Boston, MA

Upload: viswanath-sri-srikanth

Post on 18-Aug-2015

53 views

Category:

Documents


1 download

TRANSCRIPT

TRACK1: ADVERTISING EFFECTIVENESSCASE STUDY: BASEBALL STADIUMS

A FRESH LOOK AT THE EFFECT OF PROMOTIONS ON BASEBALL ATTENDANCE USING HIERARCHICAL BAYESIAN ANALYSIS

Viswanath Srikanth

Tyler Deutsch

October 7, 2014

3:55-4:15pm

Back Bay Room

Seaport World Trade Center

Boston, MA

2 | CONFIDENTIAL

Who We Are

Tyler Deutsch

Background:

Tyler Deutsch is a Consultant with Slalom Consulting and has over five

years of consulting experience helping Fortune 500 clients in the financial

services, retail, and healthcare industries in the application of analytics.

Tyler is completing a Master of Science degree in Predictive Analytics from

Northwestern University. He is a member of the American Statistical

Association is based in Chicago, Illinois.

INTRODUCTIONS

Viswanath Srikanth

Background:

Sri (Viswanath Srikanth) is a Program Manager - Strategy and Analytics at

Cisco and a graduate student of Predictive Analytics at Northwestern

University. Sri was previously chair of the W3C Customer Experience Data

Layer community group, standardizing user behavior data collection for the

industry. He is based out of Chapel Hill, North Carolina.

Other Contributors Include: Alex Krawchick and Greg Wolford

3 | CONFIDENTIAL

Overview – Why Are We Here

• Major League Baseball- “America’s Favorite Pastime,” annual revenues of $8 billion in 2013

(Forbes)

• Ticket sales are a major source of revenue

• Baseball has high-supply of games - many seats go unfilled

• Competes for entertainment dollars

• How can attendance be increased?

BACKGROUND

4 | CONFIDENTIAL

Baseball Business Analytics Managers Want to Know…

BACKGROUND

No Promotion With Promotion

Att

en

dan

ce

1. Do promotions lead to an

increase in attendance?

2. If so, by how much?

?%

Let’s build a model!

5 | CONFIDENTIAL

Dataset and List of Variables

DATA OVERVIEW AND EDA

Promotion

VariablesDefinition

BobbleheadBinary indicator if a bobblehead

promotion was used that game

HeadgearBinary indicator if a headgear promotion

was used that game

ShirtsBinary indicator if a shirts promotion was

used that game

FireworksBinary indicator if a fireworks promotion

was used that game

Other

VariablesDefinition

Temperature Raw temperature (in Fahrenheit degrees)

Time of

GameDay or night game

Weather Weather as clear, cloudy, dome, or rainy

Day of Week Day of week of the game

Month Month the game was played

Interleague*

Binary indicator if the game was

interleague (e.g. 1 if American league vs.

American league team)

Intra-

Division*

Binary indicator if the game was intra-

division (e.g. 1 if Boston Red Sox vs.

New York Yankees)

Playoffs-2011*Binary indicator if the team made the

playoffs in the previous year

Average

Ticket Price*

The home team’s average ticket prices

for the 2012 season

Data collected by Erica Costello of Northwestern University

Sources: mlb.com, baseball-reference.com

*Additional data was collected by Sri, Tyler, Alex, and Greg

Dependent

VariableDefinition

Attendance Number of attendees per game

Dataset

From April- October 2012

2,421 home games across 30 teams (~81 home games per

team)

6 | CONFIDENTIAL

Explorative Data Analysis

DATA OVERVIEW AND EDAA

tten

dan

ce

7 | CONFIDENTIAL

Explorative Data Analysis Continued

DATA OVERVIEW AND EDAA

tten

dan

ce

8 | CONFIDENTIAL

Explorative Data Analysis Continued

DATA OVERVIEW AND EDAA

tten

dan

ce

9 | CONFIDENTIAL

Exploratory Pathways with Frequentist and Bayesian Models

Linear Model :

Individual Team Data

Mixed Effects Model :

Cross-Sectional Data

Bayesian Linear Model :

Individual Team Data

Hierarchical Bayesian

Model :

Cross-Sectional Data

Frequentist Exploration Bayesian Exploration

Linear Model :

Pooled Data

Fixed & Random Effects

Model :

Cross-Sectional Data

Bayesian Linear Model :

Pooled Data

Data is cross sectional so a

model is needed that

incorporates variances

across teams

Frequentist – Mixed

Effects model that

incorporates both Random

and Fixed Effects is likely to

be useful.

Bayesian – Hierarchical

Bayesian is likely to deliver

the most valuable results

MODEL DEVELOPMENT

10 | CONFIDENTIAL

Hierarchical Bayesian – Great for Cross Sectional and Panel Data

MODEL DEVELOPMENT

Hierarchical approach estimates two levels of models The first estimation is for “within respondent” variation

The second estimation is for “across respondent” variation

Hierarchical approach is particularly useful if there are limited observations for an individual respondent (or team), but more data across multiple respondents (teams)

Markov Chain Monte Carlo (MCMC) chains have allowed computation of Hierarchical Bayesian estimates to happen in a reasonable length of time

Att

en

dan

ce

Vari

ati

on

Per

Team

Baseball Attendance

Data is a form of

Cross-Sectional Data

1. Draw parameter β given the data {yt, xt} and the

most recent draw of std.dev.σ2

2. Draw σ2 given the data {yt, xt} and the most

recent draw of β3. Repeat

11 | CONFIDENTIAL

Realizing Hierarchical Bayesian Model using HBLinear Function in R bayesm Package

raw.df <- read.table("bobbleheads_v003.csv",

header=TRUE, sep = ",")

teams ← levels(raw.df$Team)

nreg ← length(teams);nreg

…….

regdata← NULL

for(i in 1:nreg){

filter ← raw.df$Team == teams[i]

y ← raw.df$Attend[filter]

X ← cbind(1,

raw.df$TAverage[filter],

raw.df$THot[filter],

raw.df$Night[filter],

………

)

regdata[[i]] ← list(y=y, X=X)

}

Data ← list(regdata=regdata)

Mcmc ← list(R=2000000, keep=10, s=s, sdelta=sdelta)

run.1= rhierLinearModel (Data=Data, Mcmc=Mcmc)

• Modeling for attendance and variables

across all teams at the same time

• 2 million iterations used to ensure “burn-

in” settlement

• Default priors specified – implying the

patterns were entirely observation driven

R Code Snippet for HB Linear

MCMC “Burn-in”

MODEL DEVELOPMENT

12 | CONFIDENTIAL

Average Attendance Impacts Effectiveness of Promotions

RESULTS

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

50,000

Boston Red Sox New York

Yankees

Chicago Cubs Pittsburg Pirates Seattle Mariners

Avera

ge A

tten

dan

ce

Average Attendance Promotion Effect

1%5% 3%

21%

17%

Stadium Capacity

13 | CONFIDENTIAL

Bayesian Model Predicts Headgear and Bobbleheads as Leading Promotions

• Seattle Mariners are a small-market

team, with middling success

• No major revenue from TV, stadium is

large, but half-empty

• Promotions likely to be most valuable

to teams like Seattle Mariners

Credible Interval

Density Distribution for Parameters Seattle Mariners as an Example

Predicted Impact of Promotions on Attendance for the Seattle Mariners

RESULTS

Promotion Mean 95% Credible Interval

Bobbleheads 2,946 2,379-3,543

Headgear 3,596 2,279-4,932

Shirts 2,085 1,656-2,573

Fireworks 1,531 1,041-2,069

14 | CONFIDENTIAL

Comparing Hierarchical Bayesian Vs. Frequentist

RESULTS

Promotion Mean 95% Credible

Interval

Mean Estimate

Bobbleheads 2,946 2,379-3,543 1,662

Headgear 3,596 2,279-4,932 2,702

Shirts 2,085 1,656-2,573 1,910

Fireworks 1,531 1,041-2,069 Dropped owing to

computational time

Hierarchical Bayesian Mixed Effects - Frequentist

• The promotions were significant under both approaches

• Both approaches present similar results for Headgear and Shirts, but Mixed Effects shows

a smaller lift for Bobbleheads

• In this case, we defaulted the priors with Hierarchical Bayesian – if known experience of

baseball owners were presented as priors, HB’s predictive power would further improve

Results for Seattle Mariners

15 | CONFIDENTIAL

Key Takeaways

CONCLUSION

Promotions are Effective

• Most effective for mid market teams

• Bobbleheads and Headgear have highest effect on attendance

• Cost/Benefit profitability analysis should be next step

Cross Sectional or Panel Data? Consider Hierarchical Bayesian

16 | CONFIDENTIAL

Questions?

THANK YOU

Contact:

Viswanath Srikanth – [email protected]

Tyler Deutsch - [email protected]

17 | CONFIDENTIAL

Appendix

APPENDIX

18 | CONFIDENTIAL

Frequentist vs. Bayesian – What’s the Difference?

MODEL DEVELOPMENT

1. Prior information is

important and should

be used.

2. Interpretation: An

interval that has a 90%

probability of containing

the true parameter.

3. Calculation intensive,

demands strong

computing power

Bayesian

1. No information prior

to the model. “Purist”

approach.

2. Interpretation: A

collection of intervals

with 90% of them

containing true

parameter.

3. Relatively slow

computer performance

led to frequentist

dominating the scene

until the 1990s.

Frequentist

VS.

xkcd.com