optimal large-scale internet media selection
TRANSCRIPT
Optimal Large-Scale Internet Media Selection
Courtney Paulson, Lan Luo, and Gareth M. James ∗
July 10, 2015
Abstract
Internet advertising is vital in today’s business world. It is uncommon for a major
Internet advertising campaign not to include an online display component. Neverthe-
less, research on optimal Internet media selection has been sparse. Firms face consider-
able challenges in their budget allocation decisions: the large number of websites they
may potentially choose; the vast variation in traffic and costs across websites; and the
inevitable correlations in viewership among these sites. Generally, attempting to select
the optimal subset of websites among all possible combinations is a NP-hard problem.
Therefore, existing approaches can only handle Internet media selection in settings on
the order of ten websites. We propose an optimization method that is computationally
feasible to allocate advertising budgets among thousands of websites. While perform-
ing similarly to extant approaches in settings scalable to prior methods, our approach
successfully tackles the challenging task of large-scale optimal Internet media selection.
Our method is also flexible to accommodate practical Internet advertising considera-
tions such as targeted consumer demographics, mandatory media coverage to matched
content websites, and target frequency of ad exposure.
1. Introduction
With the increased role of Internet use in the United States economy, Internet advertising is
becoming vital for company survival. In 2012, U.S. digital advertising spending (including
display, search, and video advertising) totaled 37 billion dollars (eMarketer, 2012). Of that 37
billion dollars, Internet display advertising accounted for 40%. Internet display ad spending
is also expected to grow to 45.6% of the total in 2016, outpacing paid search ad spending
(eMarketer, 2012). Such an increasing trend in Internet display advertising is related to a
∗Marshall School of Business, University of Southern California.
1
wide range of benefits offered by this advertising format, including building awareness and
recognition, forming attitudes, and generating direct responses such as website visits and
downstream purchases (Danaher et al., 2010; Hoban and Bucklin, 2015; Manchanda et al.,
2006).
Nevertheless, firms face considerable challenges in optimal Internet media selection of
online display ads. Because each website represents a unique advertising opportunity, the
number of websites firms may potentially choose to advertise among is extremely high. These
websites also vary vastly by their traffic and advertising costs. Furthermore, when optimizing
advertising budgets across a large number of websites, it is crucial for firms to account for the
inevitable correlations in the viewership among these sites. For example, the 2011 comScore
Media Metrix data show there is over 95% correlation in the viewership of Businessweek.com
and Reuters.com. In such cases, heavy advertising on both websites will inefficiently cause
firms to advertise twice to mostly the same viewers.
These challenges are so formidable that, although Internet advertising is increasingly
recommended to reach consumers (e.g. Unit, 2005; Chapman, 2009), companies often have
to rely on advertising exchanges such as DoubleClick to manage their Internet ad campaigns
(Lothia et al., 2003). These exchanges are recent innovations in advertising that allow firms
to outsource their Internet ad campaigns, giving firms the opportunity to expand online
advertising without having to combat the challenges themselves (Muthukrishnan, 2009).
Generally, a company will specify campaign characteristics (such as which types of consumers
to target) and pay a certain amount of money to the exchange to conduct a campaign with
those characteristics.
One advantage of ad exchanges is their ability to employ behavioral ad targeting, that
is, targeting ads to consumers based on their Internet browsing histories (Chen et al., 2009).
This is usually accomplished by installing cookies or web bugs on users’ computers to track
their online activity. However, this has led to numerous privacy concerns and, in some
cases, legal action against behavioral targeters (Hemphill, 2000; Goldfarb and Tucker, 2011).
Another major concern with outsourcing Internet display ad campaigns to ad exchanges is
that companies must turn over the control of the campaign to the exchange, which creates a
classical principal-agent problem. While the focal firm can request target demographics, the
exchange will ultimately solely determine how funds are allocated (Muthukrishnan, 2009). In
2
such cases, the ad exchange serves as a broker who maximizes its own profit via distributing
ad impressions across multiple campaigns from multiple firms, rather than allocating funds
aligning with each individual firm’s best interest. Consequently, when running an online ad
campaign through an ad exchange, the focal firm’s budget allocation may be more or less
sub-optimal compared with the alternative of managing its own campaign.
In this paper, we propose a method to overcome the above challenges and concerns.
We emphasize a scenario in which firms wish to retain control of their online advertising
campaigns, rather than entirely outsourcing such campaigns to advertising exchanges. In
particular, we consider a setting in which a company wishes to maximize reach, i.e. the
fraction of customers who are exposed to a given ad at least one time. In such cases,
firms still face the same Internet advertising challenges of overwhelming scope and variety.
Historically, to be in full control of their own online advertising campaigns, firms often had
to employ heuristics to choose a select number of websites over which to advertise. These
heuristics include advertising only at big-name websites like Amazon or Yahoo or allocating
evenly over the most visited websites under consideration (Cho and Cheon, 2004). While
such heuristics have been adopted in practice, they can lead to substantial suboptimal budget
allocation. For example, the five highest traffic websites are likely not the optimal sites for
firms to advertise over. Consider again the case of Businessweek and Reuters. The two
websites are both high in traffic. But they actually share highly similar users. A firm will
waste money without gaining many new ad viewers by heavily advertising on both websites,
even if a firm wishes to target primarily frequent viewers of such websites. In addition, a
very popular, high-traffic website may also be very expensive to advertise on and may have
a large percentage of repeat visitors. Hence, it may not be the most cost-effective option for
firms to spend a considerable portion of their advertising budgets on such websites. In many
cases, choosing a less visited but also less expensive website could be a better choice.
Despite the considerable importance of optimal Internet media selection for online display
ads, very few researchers have proposed methods to alleviate the above challenges faced by
firms. Danaher’s Sarmanov-based model (Danaher et al., 2010) was among the first and
most successful attempts to optimally allocate budget across multiple online media vehicles.
This Sarmanov-based method has been proven to work well for budget allocation in settings
on the order of 10 websites. While Danaher’s work represents the most state-of-the-art
method for allocating Internet advertising budget, under this method the consideration of
3
each additional website increases the optimization difficulty exponentially such that the
Sarmanov criterion becomes very difficult to optimize over more than approximately 10
websites (Danaher, 2007; Danaher et al., 2010). For example, even if firms know they wish
to advertise across only 10 out of 50 potential websites, they must test each possible 10-
website combination, resulting in over 10 billion individual problem calculations. Since each
website represents a separate advertising opportunity, such methods are hindered by the
huge volume of Internet websites on which firms could potentially choose to advertise.
The primary goal of this research is to develop a method that allows firms to efficiently
select and allocate budget among a large set of websites (e.g., thousands). One reason for
the difficulty in considering a large number of websites is that the problem of choosing a
subset of websites is generally NP-hard. In a setting involving p potential websites, each of
the 2p possible website subsets must be considered separately, leading to a computationally
infeasible problem.
In a linear regression setting, a similar problem is encountered when performing variable
selection involving large numbers of independent variables. A common solution, adopted by
the statistical literature, involves optimizing a constrained convex loss function, a relaxed
version of the NP-hard variable selection problem. A selection of recent papers includes the
Lasso (Tibshirani, 1996), SCAD (Fan and Li, 2001), the elastic net (Zou and Hastie, 2005),
the adaptive Lasso (Zou, 2006), CAP (Zhao et al., 2009), the Dantzig selector (Candes and
Tao, 2007), the relaxed Lasso (Meinshausen, 2007), and VISA (Radchenko and James, 2008).
Built upon this stream of research, we develop an analogous constrained criterion ap-
proach in our setting, i.e., a relaxed version of the NP-hard website selection problem. Our
method is related to the well-known Lasso formulation (Tibshirani, 1996), but diverges in
that our optimization criterion does not involve a quadratic loss function. Our empirical
investigation illustrates that, for a small number of websites, the proposed method performs
similarly to Danaher et al. (2010). Furthermore, our method can be used effectively in major
online advertising campaigns where a large number of websites is under consideration. Even
5000 websites takes under twenty seconds to optimize for a particular budget on a personal
laptop computer.
We further demonstrate that this method is flexible enough to accommodate common
practical Internet advertising considerations such as targeted consumer demographics, manda-
4
tory media coverage to matched content websites, and target frequency of ad exposure.
Consequently, firms could use our method to fully control their own Internet advertising
campaigns instead of being forced to rely on advertising exchanges, but without having to
give up specific targeting of particular demographic groups and/or websites. Additionally,
our algorithmic efficiency allows firms to quickly compare expected reach across numerous
budgets and various Internet advertising opportunities, giving firms a broad range of adver-
tising campaign and cost options.
The remainder of the paper is structured as follows: in Section 2, we describe our con-
strained optimization approach as a high-dimensional efficient alternative to existing methods
for large-scale Internet advertising optimization. In Section 3, we discuss simulation studies
that compare our optimization to Danaher et al.’s existing method and demonstrate that the
proposed method can handle budget allocation across thousands of websites. Also in Section
3, we provide two case studies (McDonald’s McRib Advertising Campaign and Norwegian
Cruise Lines Wave Season Advertising Campaign) using 2011 comScore Media Metrix data.
We conclude in Section 4 with a summary of our findings, contributions, and avenues for
future work.
2. Methodology
2.1 Model Formulation
Consider a firm that has a budget B for a campaign that is to be run over a particular time
span (e.g., one month or one quarter). A common goal for such a campaign would be to
allocate the firm’s budget across a set of p possible websites to maximize the probability that
an Internet user views the ad at least once during the campaign. This probability is known
as the reach of a campaign. Let wj represent the budget allocated to advertising at the jth
website, where j = 1, . . . , p. Further, let Xij represent the number of times an ad appears
to customer i during her visits to website j during the course of the ad campaign, where
i = 1, . . . , n. Hence, Yi =∑p
j=1Xij corresponds to the total number of ad appearances to
customer i over all websites. Let us also denote an n by p matrix as Z, with zij corresponding
to the number of visits of customer i to website j during the time span of the ad campaign.
In practice, such data (e.g., the comScore Media Metrix data) are available from commercial
browsing-tracking companies such as comScore.
5
Within this context, our problem can be formulated as a fairly common marketing sce-
nario: given that we are constrained by a budget B, how do we allocate that budget to
maximize reach during our Internet display ad campaign? Mathematically this is equivalent
to the following optimization problem:
minw
1
n
n∑
i=1
P (Yi = 0|zi,w) subject to
p∑
j=1
wj ≤ B, and wj ≥ 0, j = 1, . . . , p, (1)
wherew = (w1, . . . , wp) denotes the budget allocation to the pwebsites, and zi = (zi1, . . . , zip)
represents the number of times consumer i visits the p websites over the course of the Internet
ad campaign.
It is challenging to solve Equation (1) because p may be in the thousands, which means
this is an extremely high dimensional optimization problem. Additionally, the optimal so-
lution to Equation (1) should be able to accommodate corner solutions (i.e., the solution
should allow wj = 0 to arise as an optimal solution for certain websites). We discuss how
we address both challenges below.
We first express P (Yi = 0|zi,w) as a function of zi and w, where Yi =∑p
j=1Xij . A
natural approach is to model Xij as a Poisson random variable with expectation γij, i.e.
Xij |zij, wj ∼ Pois(γij) or equivalently,
P (Xij = x|zij , wj) =e−γijγx
ij
x!. (2)
In Equation (2), we model γij as the expected number of ad appearances to consumer i at
website j, given the consumer’s number of visits to the site (zij) and the amount of money the
focal firm spends on advertising at the site (wj). This expected number of ad appearances
is given by the probability of an ad appearing on a random visit to website j (denoted as sj)
multiplied by the number of visits (zij), i.e. γij = sjzij . For example, if a firm buys 20% of
ad impressions at a particular website, and a consumer visits that website ten times during
the course of the ad campaign, γij = 0.2×10 = 2. In this example, on average we expect the
consumer to see the ad twice during the ten visits. The probability the ad appears is simply
the number of ad impressions bought at the website over the total number of expected visits
by all customers to the site, so sj is called the share of ad impressions (Danaher et al., 2010).
Note because of this, sj is interchangeable with wj; buying all ad impressions for website j
means sj = 1 (or, equivalently, wj is maximized such that the ad appears to all visits to the
6
site), while buying no impressions means sj = 0 (or, equivalently, wj = 0). In the paragraph
below, we provide the formula that outlines the exact correspondence between sj and wj.
Let τj represent the expected total number of visits at the jth website during the course
of the ad campaign. Following Danaher et al. (2010), we operationalize τj as τj = φjN ,
with φj being the expected number of per person visits to site j during the ad campaign,
and N being the total Internet population. Let cj represent the cost to purchase 1000
impressions. (Note that this is an industry standard, referred to popularly as CPM.) Then
the total number of impressions purchased will be given by 1000wj/cj . Hence, we obtain
the corresponding relationship between sj (share of ad impressions) and wj (budget spent)
as follows: sj =1000wj
τjcj. For example, if the CPM of a particular website is $2, the expected
total number of visits to the website during the entire ad campaign is 10 million, and the
firm spends $500 advertising on the website, the firm has bought 2.5% of the ad impressions
at that website.
Given γij = sjzij and substituting sj with1000wj
τjcj, we can express γij as a function of zij
and wj below:
γij = θij × wj where θij =zij
τjcj
1000
. (3)
In Equation (3), θij is a known quantity given values of zij , τj , and cj . With this setup,
correlations in viewership among the p websites are directly captured in the zij terms which
carries into θij and then into γij. In Appendix A, we provide a simple illustration that
demonstrates how correlations in the Z matrix are incorporated in our method.
Thus we can model Yi =∑p
j=1Xij as a Poisson distribution with expected value γi =∑p
j=1 γij , i.e.
P (Yi = y|zi,w) =e−γiγy
i
y!. (4)
Combining Equation (4) with our original Equation (1) gives the criterion we wish to opti-
mize:
minw
1
n
n∑
i=1
e−γi subject to∑
j
wj ≤ B and wj ≥ 0, j = 1, . . . , p. (5)
The optimization in Equation (5) has the following appealing properties. First, because
the objective function is a well-behaved convex and smooth function, it is relatively easy
to solve the optimization, even for large values of p. This transforms the original problem
7
from NP-hard to one that is relatively easy to optimize. The algorithm will also not stall
at suboptimal local minima. Second, the form of Equation (5) encourages sparsity in the
solution. Under each given budget, as the number of websites under consideration increases,
our optimization criterion will automatically set a budget of zero to more websites (hence
the corner solutions as we desired; see more discussions on this in Hastie et al., 2009, p. 71).
Lastly, given the convex and smooth nature of the objective function, prior budget solutions
can be used as effective starting points of neighboring budgets. Therefore, we are able to
efficiently optimize over a range of budgets rather than merely solving one particular budget
at a time.
2.2 The Optimization Algorithm
In order to solve Equation (5), we reformulate the optimization using a Lagrangean1:
minw
1
n
n∑
i=1
e−γi +λ
n(∑
j
wj − B) subject to wj ≥ 0, j = 1, . . . , p, (6)
where λ > 0 is the Lagrangean multiplier. (Note λ must be greater than zero in our setting
given the constraint that budget must always be nonnegative.)
It is evident that, for each given budget, there is a corresponding Lagrangean multiplier
λ. For a given number of websites, as budget increases, λ decreases, and the algorithm
allocates more budget to more websites. As budget decreases, λ increases, and we get a
sparser solution.
Since we optimize over the w terms, Equation (6) can be simplified as Equation (7), with
B dropping out of the first order conditions.
minw
1
n
n∑
i=1
e−γi +λ
n
∑
j
wj subject to wj ≥ 0, j = 1, . . . , p. (7)
Although there is no direct closed form solution to Equation (7), problems similar to
that of Equation (7) have been extensively studied recently in the literature, particularly in
statistics, e.g. (Efron et al., 2004; Friedman et al., 2010; Goeman, 2010; Hesterberg et al.,
1in the statistical literature, this is commonly referred to as a penalized optimization equation. Instatistics, the λ
n
∑
j wj penalty would frequently be written as an ℓ1 penalty rather than a summationpenalty. However, for our setup, these two are identical, since we have the condition wj ≥ 0 for all j.
8
2008; Rosset and Zhu, 2007; Schmidt et al., 2007). As a result there exist very efficient
algorithms for solving such problems. In this paper, we utilize one of the most efficient
and easy to implement algorithms known as coordinate descent to solve Equation (7) over
a grid of values for λ, which in turn provides optimal allocations for a range of possible
campaign budgets. The idea behind coordinate descent simplifies our optimization to a
single one-dimensional optimization as described below (see Appendix B for more details of
the algorithm):
Algorithm 1 Coordinate Descent Algorithm for Budget Optimization
1. Specify a maximum budget, Bmax.
2. Initialize algorithm with w̃ = 0, j = 1, and λ corresponding to B = 0.
3. For j in 1 to p,
(a) Marginally optimize Equation (7) over a single website budget wj, keeping
w1, w2, . . . , wj−1, wj+1, . . . , wp fixed.
(b) Iterate until convergence.
4. Increase budget by incrementally decreasing λ over a grid of values, with each λ cor-
responding to a budget, and repeat Step 3 until reaching Bmax.
What makes this approach so efficient is that each update step is fast to compute and
typically not many iterations are required to reach convergence in Step 3 of the algorithm
above. Note that convergence is guaranteed by (Luo and Tseng, 1992) for the form of
Equation (7) as in Step 3 above. Thus our optimization becomes very efficient to solve for
a range of budgets at once.
However, because there is no closed form solution to Equation (7), we use a quadratic
approximation for the objective function in Step 3 of Algorithm 1. Specifically, since we are
using a coordinate descent approach around a point wj, we employ a second order Taylor
approximation of e−γi around wj as follows:
e−γi ≈ e−γ̃i
1−
p∑
j=1
θij(wj − w̃j) +1
2
p∑
j=1
p∑
k=1
θijθik(wj − w̃j)(wk − w̃k)
s.t. wj , wk ≥ 0, j, k = 1, . . . , p,
(8)
9
where γ̃i =∑p
j=1 θijw̃j , and w̃j can be taken as our most recent estimate for wj based on the
last iteration of the algorithm.
Substituting (8) into (7) and computing the first order condition with respect to wj, all
terms involving w1, w2, . . . , wj−1, wj+1, . . . , wp drop out of our criterion. Hence, up to an
additive constant (i.e. the first term of the Taylor expansion), we can approximate Equation
(7) for a particular coordinate wj as:
minwj
1
n
n∑
i=1
e−γ̃i
(
−θij(wj − w̃j) +1
2θ2ij(wj − w̃j)
2
)
+λ
nwj subject to wj ≥ 0. (9)
With our simplified criterion, we show in Appendix B that the first order condition to
Equation (9) can be written as Equation (10), with the otherwise condition enforcing wj ≥ 0:
wj =
w̃j +∑n
i=1e−γ̃iθij−λ
∑ni=1
e−γ̃iθ2ijfor Hj > λ
0 otherwise,(10)
where Hj =∑n
i=1 e−γ̃iθij(w̃jθij + 1) (note that Hj is always positive here). Equation (10)
incorporates the wj ≥ 0 condition by testing if the wj coefficient has been forced below zero
by the update. If it has, we set that coefficient to 0, the minimum value allowed (since
budget cannot be negative). This equation can be computed quite efficiently.
Therefore, the optimization in Equation (7) can be solved by iteratively computing Equa-
tion (10) for j from 1 to p and repeating until convergence.2 Appendix B also demonstrates
the computational efficiency of our algorithm. When increasing the number of websites under
consideration to 5000, it takes less than twenty seconds to optimize for a particular budget
on a personal laptop computer with a 2.30 GHz processor.
2.3 Model Extensions
In what follows we discuss three extensions to the proposed method. We will provide an
illustration of each extension in Section 3.
2Because we employ a Taylor approximation in our algorithm, we also did some empirical evaluationto verify the convergence of the approximation. We ran our algorithm with numerous initialization pointsto determine if the optimization had converged to a global optimum. In all cases, we obtained identicalsolutions regardless of initialization points and the convergence was achieved under very few iterations.
10
2.3.1 Extension 1: Targeted Consumer Demographics
In this subsection we describe how the method discussed above can be modified to accom-
modate targeted consumer demographics. Suppose that each individual belongs to one of m
possible demographic groups. For example, if we wished to target people based on household
income and whether or not they had children, we could have m = 4 possible demographic
groups (low household income with or without children, and high household income with or
without children). It will often be the case that the “actual” proportions of individuals with
these demographics in our data, P1,a, . . . , Pm,a, will differ from the targeted demographic
makeup, P1,d, . . . , Pm,d, of the firm. For instance, it may be that the fraction of individuals
with low household income and with children in our data Z is PLC,a = 0.3, while the focal
firm’s target consumer base consists of a much greater percentage of such consumers, e.g.,
PLC,d = 0.6. Within this context, we would like to upweight individuals with low household
income and children in our data sample.
This is easily accomplished with a simple adaptation to Equation (7):
minw
1
n
n∑
i=1
pie−γi +
λ
n
∑
j
wj subject to wj ≥ 0, j = 1, . . . , p. (11)
where pi = PDi,d/PDi,a and Di represents the demographic group that individual i falls into.
Since PDi,a is computed from observed data and PDi,d is based on the focal firm’s target
customer base, pi is a fixed and known quantity. Therefore, optimizing Equation (11) is
accomplished in exactly the same fashion as for Equation (7).
2.3.2 Extension 2: Mandatory Media Coverage to Matched Content Websites
Aside from targeted consumer demographics, a firm might wish to impose mandatory media
coverage to certain subsets of websites. For example, when planning the online advertising
campaign for its annual “wave season,” Norwegian Cruise Lines may want to allocate a
certain minimum budget to advertising on aggregate travel sites such as Orbitz or Expedia
in addition to other websites. In this subsection we discuss how the proposed method can
be modified to accommodate such requirements. Specifically, we can modify Equation (7)
to require wj to be above a certain threshold, say wj ≥ minj , to ensure that a minimum
budget is allocated to each aggregate travel website j.
11
Using the same approach as for optimizing Equation (7) we can show that the new
optimization is accomplished by setting the “otherwise” condition in Equation (10) to a
minimum non-zero amount. Specifically, we would replace Equation (10) with the following:
wj =
w̃j +∑n
i=1e−γ̃iθij−λ
∑ni=1
e−γ̃iθ2ij
for Hj − λ > minj
minj otherwise.(12)
2.3.3 Extension 3: Target Frequency of Ad Exposure
Another practical consideration in an online advertising campaign is the target frequency of
ad exposures (e.g., Krugman, 1972; Naples, 1979; Danaher et al., 2010). For example, sales
conversions and profits from online display ads might be highest when the consumer is served
an ad within a certain range of frequencies (e.g., one to three times) during the duration of
the ad campaign. The proposed method can also be readily modified to accommodate such
considerations. Within our context, this corresponds to P (ka ≤ Yi ≤ kb|zi,w) where ka < kb
respectively represent lower and upper bounds on ad exposures. Given prior experience,
the firm might determine the lower bound (i.e., ka) and the upper bound (i.e., kb) for the
target range of ad exposures. This is known as effective frequency or frequency capping (the
latter typically sets the lower bound at 1 and imposes an upper bound on the number of ad
exposures).
Within our context, we can modify Equation (5) as follows to accommodate such con-
siderations:
maxw
1
n
n∑
i=1
kb∑
y=ka
P (Yi = y|zi,w) subject to∑
j
wj ≤ B, and wj ≥ 0, (13)
where as before P (Yi = y|zi,w) =e−γiγ
yi
y!. Using the example of 1 ≤ Yi ≤ 3, our problem
involves maximizing1
n
n∑
i=1
e−γi(γi +1
2γ2i +
1
6γ3i ). (14)
Again we take a second-order Taylor expansion, resulting in equations with a similar form
to Equation (9) and Equation (10).
12
3. Empirical Investigation
In Section 3.1, we compare the proposed method with the method by Danaher et al. (2010).
In Section 3.2, we demonstrate how our method can be used for optimal budget allocation
when the number of websites under consideration is very large (e.g., 5000 websites), which
is computationally prohibitive for extant methods. In sections 3.3 and 3.4, we discuss two
case studies where we use the proposed method and its extensions for McDonald’s McRib
and Norwegian Cruise Lines’ Wave Season online advertising campaigns.
Our empirical illustrations are based on the 2011 comScore Media Metrix data, which
comes from the Wharton Research Data Service (www.wrds.upenn.edu). comScore uses
proprietary software to record daily webpage usage information from a panel of 100,000
Internet users (recorded anonymously by individual computer). Therefore, the comScore
data can be used to construct a matrix of all websites visited and the number of times each
computer visited each website during a particular time period. A number of prior studies
in marketing have utilized comScore Media Metrix data (e.g., Danaher, 2007; Liaukonyte
et al., 2015; Montgomery et al., 2004; Park and Fader, 2004).3
3.1 Comparison between Proposed Method and Danaher et al. (2010)
3.1.1 Comparison using Data Simulated from Danaher et al.s Sarmanov Func-
tion
To date, the state-of-the-art method for optimal budget allocation of Internet display ads is
by Danaher et al. (2010). A basic premise of this method is that the number of visits indi-
viduals have to websites (denoted as a n by p matrix Z in our context) can be characterized
by a multivariate negative binomial distribution (referred to as MNBD hereafter). Within
this setup, Danaher et al. (2010) proposes an optimization method to maximize reach for
each given budget.
3We followed Danaher et al. (2010) to calculate the effective Internet population size for our data (denotedas N in Section 2). We first consider the size of the U.S. population at the time of our data set, which is310.5 million (Schlesinger, 2010). We then multiply it by the proportion of users who actually visited atleast one website in our data set (for example, 48.63% in our comScore January 2011 data). We then defineN as 155.25 million (48.63%*310.5 million). It is worth noting that, because the specific value of N simplyserves as a baseline effective Internet population estimate in our reach estimates, the relative performanceof various methods remain qualitatively intact if N is defined as a smaller/greater proportion of the U.S.population.
13
To examine how our method performs under the basic premise of Danaher et al.’s ap-
proach, we first simulate a Z matrix from an MNBD distribution with a set of known
parameters. Based on the simulated Z matrix, we know the true optimal reach under each
budget. Next, we apply both methods on the simulated Z matrix and compare the discrep-
ancies between the true optimal reach and the reach obtained based on the budget allocations
suggested by the two methods.
Because the Z matrix in this case originates from the MNBD distribution (which is the
basic premise of Danaher et al.’s method), we expect that Danaher et al’s (2010) method
would perform better than the proposed method under such comparisons. Nevertheless, we
aim to evaluate the extent to which the proposed method could achieve a reach that is similar
to the true optimal or the reach obtained under Danaher et al.’s (2010) method. Because
Danaher et al.’s (2010) method is only computationally efficient for budget allocation across
a relatively small number of websites, we demonstrate such comparisons for the case of seven
websites below.
We first generate the Internet usage matrix, Z, with 5000 rows (users) and 7 columns
(websites), based on a MNBD with αj and rj, j = 1, ..., 7, the usual parameters associated
with a MNBD, and ωj,j′, a set of correlation parameters denoting the correlation coefficient
in viewership between websites j and j′. To make our simulation as realistic as possible,
we establish αj , rj , and ωj,j′ as the values from the seven most visited websites from the
December 2011 comScore data. We also use the CPMs provided by comScore’s 2010 Media
Metrix (Lipsman, 2010) in this stimulation. See Appendix C for more details on our data
generation method.
We then employ the following procedure to compare the two methods. We first obtain
the true optimal reach under each budget based on the true αj , rj , and ωj,j′ parameters and
the optimal criterion in Danaher et al.’s (2010) method. Next, we apply both the proposed
and Danaher et al.’s (2010) methods on the simulated Z matrix to obtain the corresponding
reach estimates. Note that Danahers methodology optimizes over share of impressions, sj ,
instead of monetary spending, wj . Nevertheless, we can readily convert sj to wj using the
formula sj =1000wj
τjcjas given in Section 2.4
4Since Danaher et al.s reach function is highly nonconvex, it can find local optima during optimization.Consequently, we run his optimization with several initialization points and choose the results with thehighest reach in our result comparisons. Since a firm cannot buy more than 100% of ad impressions at a
14
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Dan
aher
)
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Pro
pose
d M
etho
d)
OptimalProposedDanaher
Figure 1: Performance Comparison between Proposed and Danaher’s (Simulated Data)
Given that the proposed and Danaher et al.’s (2010) method each have their own defi-
nitions of the reach function, we report the reach comparisons using both reach definitions.
(See Danaher et al. 2010 for the formal definition of their reach function.) Figure 1 shows
the reach curves for the average reach estimate at each budget across the 100 simulation
runs using both definitions of reach. Note that the true optimal reach is in solid black, the
Danaher estimate is in dashed red, and our estimate is in dotted blue.
When using Danaher et al.’s reach function (left panel), both methods yield reach fairly
close to the true optimal reach. As expected, Danaher’s method performs slightly better
under this comparison, because not only are we using Danaher’s definition of reach, but
website (i.e., 0 ≤ sj ≤ 1), we force our algorithm’s optimization to stop allocating budget to a website oncewj =
τjcj1000
is reached (corresponding to sj = 1).
15
we also generate the data from the MNBD assumed by Danaher’s method. When using
our reach definition (right panel), again, both methods perform reasonably well. Our reach
estimate even slightly outperforms the optimal reach toward the higher budgets. This occurs
because the optimal reach is based on Danaher’s reach definition whereas here we are using
our definition of reach in this figure.
Overall, the comparisons above demonstrate that, even when the Internet usage matrix
Z is simulated from the MNBD as assumed in Danaher’s method, our method performs
reasonably well. Comparing the computation speed of the two methods, we discover that the
computation speed of the proposed method is over ten times faster under this setting. Given
the highly non-convex nature of the optimization criterion in Danaher et al’s (2010), the
discrepancies in computation speed would increase exponentially for larger-scale problems.
3.1.2 Comparison using comScore Data
In this subsection, we compare the two methods using the December 2011 comScore Media
Metrix data. Specifically, we use Internet usage data from the top seven most visited websites
that support Internet display advertisements. The full month of data contained 51,093 users
who visited one of the seven websites at least once in December 2011. We fit both the
proposed and Danaher’s method to 100 randomly chosen subsets of these users, each of size
5,109 (approximately ten percent of the population). Again, we use the CPMs as given in
comScore Inc.’s Media Metrix data from May 2010 Lipsman (2010).
Figure 2 shows the reach curves for the average reach at each budget across the 100
sample runs, using both reach functions. Within this context, we define the true optimal
reach (black solid) as that obtained from our method applied to the entire data set of 51,093
users. Danahers (red dashed) and our (blue dotted) estimates are both computed from the
10% subsets of the data. This also approximates real-world conditions in which a company
has access to only part of the total browsing history of all Internet users. All reach curves in
Figure 2 are then calculated on the ninety percent holdout data to ensure fair comparisons
across methods.
When using Danaher’s definition of reach (left panel), the three methods yield relatively
similar reach. Similar to the right panel of Figure 1, the Danaher reach estimate outperforms
the “optimal” reach in the left panel because the optimal reach in this case was computed
16
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Dan
aher
)
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
Budget (in millions)
Rea
ch (
Pro
pose
d M
etho
d)
OptimalProposedDanaher
Figure 2: Performance Comparison between Proposed and Danaher’s (Real Data)
using the proposed method. When using our definition of reach (right panel), the full data
set performs best as expected, followed by results from applying our method to the 10%
subset, and finally the Danaher estimates.
To conclude, both comparisons in Section 3.1 illustrate that the proposed and Danaher’s
methods perform similarly when applied to problem settings scalable to the latter. However,
due to its non-convex optimization criterion, the Danaher approach is considerably slower to
compute and as a result encounters significant computational difficulties in settings involving
a large number of websites. In Section 3.2, we demonstrate that, while computationally
prohibitive for extant methods, the proposed method can be used to optimally allocate
advertising budget across a very large number of websites.
17
3.2 Simulated Large-Scale Problem: 5000 Websites
In practice, most Internet media selection problems involve far more than a handful of web-
sites. In this subsection we illustrate how the proposed method can optimize over thousands
of websites. To demonstrate this, we simulate an Internet usage matrix of 50,000 people over
5000 websites.5 The visits to each website are randomly generated from a standard normal
distribution (after both rounding and taking the absolute value, since website views are pos-
itive integers) which are then multiplied by a random integer from zero to ten with higher
weight on a value of zero. This ensures that our simulated data set has similar characteristics
to the observed comScore data, since we observe a high percentage of matrix entries in the
real data as zeros. The CPMs of these websites are randomly generated, chosen from 0.25
to 8.00 in increments of 0.25.
Similarly as before, we run our method over a 10% subset of the 50,000 users, then calcu-
late reach on the 90% holdout data in our result comparisons. Because it is computationally
prohibitive for Danaher et al.’s method to optimize over 5000 websites, we compare the
proposed method to the following benchmark approaches: 1) equal allocation over all 5000
websites; and 2) cost-adjusted equal allocation (i.e. average number of visits/CPM) over the
most visited 10, 25, and 50 websites. We believe that these alternative approaches mimic
approaches often used in practice when the sheer number of websites is infeasible to examine
individually, such as those outlined in Cho and Cheon (2004).
Figure 3 shows the result comparisons. Even with only a ten percent subset of the data,
the proposed method yields reach estimates very similar to the optimal reach estimate based
on the entire dataset. In addition, the proposed method outperforms all the benchmark
approaches. We find that the equal allocation approach is by far the worst. The cost-
adjusted approaches perform better, but still worse than our method. Overall, we show that
the proposed method can be used to effectively allocate advertising budget across a very
large set of websites.
5We choose to simulate this dataset because data cleaning in comScore for 5000 websites is highly timeconsuming. For simplicity, the data is generated independently without correlations. Since the proposedmethod is designed to leverage correlations across sites, this setup provides a lower bound with respect toadvantages from our approach.
18
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Budget (in millions)
Rea
ch
Method
OptimalProposedTop 100Top 50Top 25Equal
Figure 3: Simulated Data Reach, 5000 Websites
3.3 Case Study 1: McDonald’s McRib Sandwich Online Advertising Campaign
We now demonstrate how the proposed method can be applied in real-world settings. In our
first case study, we consider a yearly promotion for McDonald’s McRib Sandwich, which is
only available for a limited time each year (approximately one month).
Because McRib is often offered in or around December (Morrison, 2012), we consider
the comScore data from December 2011 to approximate a McRib advertising campaign. In
particular, we manually went through the comScore data set to identify the 500 most visited
websites that also supported Internet display ads. Our data then contains a record of every
computer that visited at least one of these 500 websites at least once (56,666 users). Thus
Z is a 56,666 by 500 matrix. We then separate our full data set into a ten percent training
19
data set (5667 users) and a ninety percent holdout data set. Similarly as before, we use the
training data to fit the method, then calculate reach on the holdout data.
Table 1 provides the categorical makeup of the 500 websites we consider in this applica-
tion. We include sixteen categories of websites: Social Networking, Portals, Entertainment,
E-mail, Community, General News, Sports, Newspapers, Online Gaming, Photos, Fileshar-
ing, Information, Online Shopping, Retail, Service, and Travel. The Total Number column
provides the total number of websites in each category. For simplicity, the CPM values for
each website are based on average costs of the website categories provided by comScore Inc.’s
Media Metrix data from May 2010 (Lipsman, 2010).6 Table 1 shows that Entertainment and
Gaming are by far the largest categories (with 92 and 77 websites out of 500, respectively),
while Sports, Newspaper, and General News are the most expensive at which to advertise
(all over $6.00). Additionally, it appears in Table 1 that advertising costs vary considerably
across these website categories. In Appendix D (Table A3), we also provide an overview of
viewership correlations within and across each of the sixteen website categories.
Table 1 also shows the number of websites chosen in each of the sixteen website categories
over three different methods: 1) the original approach that maximizes overall reach, 2) our
extension to maximize reach among targeted consumer demographics, and 3) our extension
to maximize effective reach with target frequency of ad exposures. This table also provides
the number of websites chosen in each category when we only account for the top 25 and
top 50 most visited sites as benchmarks to our approach. More details about our result
comparisons are provided below.7
3.3.1 McRib Campaign: Maximizing Overall Reach
In this subsection, we assume that McDonald’s simply attempts to reach as many users
as possible during its McRib campaign. Again, because Danaher et al’s (2010) method
cannot optimize over 500 websites, we use the following benchmark methods in our model
comparisons: equal allocation over all 500 websites, and cost-adjusted equal allocation across
the top 10, 25, and 50 most visited websites.8
6In practice firms could readily apply actual CPMs of all sites in such an optimization.7Detailed budget allocation results for each budget and each website are available from authors upon
request.8Note that, while included in Figure 4, the 10-website benchmark method is omitted from Table 1 for
space considerations.
20
Proposed Method Benchmark
Budget = $500K Budget = $2 million
Total Targeted Targeted Targeted Targeted Top Top
Category Number CPM Original Consumers Exposures Original Consumers Exposures 25 50
Community 23 2.10 8 8 11 14 14 20 1 4
E-mail 7 0.94 7 7 7 7 7 7 3 5
Entertainment 92 4.75 2 1 10 13 10 29 0 0
Fileshare 28 1.08 23 20 26 24 22 28 2 7
Gaming 77 2.68 30 40 44 37 45 59 0 1
General News 12 6.14 0 0 0 0 0 0 0 0
Information 47 2.52 24 25 29 27 27 36 1 3
Newspaper 27 6.99 0 0 0 0 0 0 0 0
Online Shop 29 2.52 11 12 15 15 15 26 1 1
Photos 9 1.08 6 6 9 8 9 9 0 2
Portal 30 2.60 13 14 17 16 16 26 5 7
Retail 57 2.52 33 39 39 36 41 49 2 7
Service 18 2.52 13 14 10 14 14 12 2 2
Social Network 17 0.56 16 17 17 17 17 17 8 11
Sports 17 6.29 0 0 1 1 0 1 0 0
Travel 10 2.52 6 7 8 8 8 8 0 0
Table 1: Website Categories Chosen by Method, McRib
Table 1 reports the categorical makeup of chosen sites under two budgets ($500K and $2
million). This categorical makeup shows how many websites in each category were chosen
with non-zero budget allocation in the solutions of the optimization. It is not surprising
that the optimization does not select many websites in relatively expensive categories such
as Sports, Newspaper, and General News. Advertising at a relatively expensive website is
only desirable when that website can reach an otherwise unreachable audience. In this case,
other websites offer reach without the high price. Social Networking, for example, offers a
relatively inexpensive way to reach consumers who are visiting other websites as well. Note
that in Table A3 in Appendix D, social networking sites have relatively high correlations
in viewership across other site categories with the only exception being email and gaming
sites. Consequently, the optimization ultimately includes all 17 Social Networking websites
and leaves out the expensive categories where reach would be duplicated.
21
0.0 0.5 1.0 1.5 2.0 2.5
0.0
0.2
0.4
0.6
0.8
Budget (in millions)
Rea
ch
Method
OptimalProposedTop 50Top 25Top 10Equal
Figure 4: McRib Campaign, Maximizing Overall Reach
It is also worth noting that our optimization selects all websites in the Email category. In
addition to the relative lower cost of advertising on these websites, there is a very low within-
category correlation in viewership among email sites (0.01 absolute average correlation; see
Appendix D). This indicates that the same consumer often does not visit more than one
email site, so including an additional email website in the optimization can result in a larger
increase in reach.
Figure 4 shows the results from the proposed method with the comparison methods.
This figure demonstrates that the proposed method again performs well with ten percent
calibration data. The reach estimates based on the ten percent calibration data are very close
to those from the true optimal based on the entire data. Additionally, the reach estimates
from the naive approaches are significantly below both.
22
Actual Desired
No Children 0.344 0.25
Children 0.656 0.75
Income below 15,000 0.135 0.25
Income 15,000-24,999 0.074 0.20
Income 25,000-34,999 0.100 0.20
Income 35,000-49,999 0.150 0.15
Income 50,000-74,999 0.260 0.10
Income 75,000-99,999 0.140 0.05
Income above 100,000 0.141 0.05
Table 2: True and Desired Proportions in Data
3.3.2 McRib Campaign with Targeted Consumer Demographics
In practice, companies often have specific target demographics in mind when running online
display ads. In this section we demonstrate that our method could be readily modified to
accommodate such needs. For illustration purposes, we consider two demographic variables
(children and income level).
We chose these two demographic variables because McDonald’s has historically targeted
families with children (Mintel, 2014). We also know fast food in general tends to target
lower-income households (Drewnowski and Darmon, 2005). Because of this, we illustrate
our approach in a scenario where the McRib campaign wishes to reweight the comScore data
set with greater emphasis on individuals from lower-income households with children.
Following the procedure outlined in Section 2.3.1, we reweight the comScore data with
target population makeup in each variable category as shown in Table 2. For example,
for “children present,” since we want to give individuals with children greater weights than
those who do not have children, we assign a weight of 0.75 to having children and 0.25 to
not having children. We do a similar weighting for income level. We choose these desired
weights arbitrarily to demonstrate our method, but in practice companies would presumably
have data on target proportions before running the campaign.
23
Table 1 shows the number of websites chosen in the reweighted setup compared to the
standard setup. In this example, reweighting the data does not drastically change the types of
websites chosen during our optimization. Families with children and lower-income households
did not represent a significant deviation from the overall data set in terms of their Internet
browsing behavior. However, we do observe some slight changes. For example, the number
of Gaming websites increases when we reweigh our data. Most of the gaming websites in our
data set are online flash-based game websites which primarily target young players (360i,
2008). Hence, it is likely that proportionally more McDonald’s consumers frequently visit
such sites.
3.3.3 McRib Campaign with Target Frequency of Ad Exposure
In this subsection we demonstrate a case in which McDonald’s wishes to allocate its ad
budget such that each individual is exposed to the ad no more than three times during the
course of the McRib campaign. For simplicity, we use the data set without demographic
reweighting, although both approaches could readily be used together. In this case, the
“effective reach” is the value of the function e−γ(γ + 12γ2 + 1
6γ3).
Again, Table 1 shows the optimization allocation across website categories for this ex-
tension. In general, under this extension, our method chooses more websites, with a corre-
spondingly lower average budget at each one. This allows more viewers to be reached with
the ad, but limits the probability an ad will appear to a particular viewer more than three
times. One example of this is the increase in number of Gaming websites chosen by the
algorithm. Gaming websites have many repeat visitors, but low correlation among visitation
to websites within the Gaming category. The algorithm chooses to advertise a small amount
at a number of Gaming sites, which gives consumers a low probability of seeing the ad on
any particular visit, but will ultimately reach different consumers with each ad appearance.
Overall, the algorithm less often includes websites with high repeat visitation. This helps
ensure that a consumer does not see the ad more times than desired. Another example of this
is that the algorithm chooses more Entertainment websites. Although the Entertainment
category is more expensive than others, we observe low repeat visitation for Entertainment
websites in our Z matrix. The websites seem to be more universally visited, so advertising
on an Entertainment website results in more different people seeing the ad.
24
3.4 Case Study 2: Norwegian Cruise Lines Wave Season Online Advertising
Campaign with Mandatory Media Coverage to Travel Aggregate Sites
Each year, the cruise industry advertises for its annual “wave season”, which begins in
January. Norwegian Cruise Lines (NCL) is among the cruise lines that participate heavily in
wave season (Satchell, 2011). Because consumers who are interested in booking a cruise often
use travel aggregation sites like Orbitz and Priceline to compare offerings across multiple
cruise lines, we use this case study to demonstrate the extension in which the proposed
method is applied in such a scenario. We consider that NCL wants to allocate at least a
minimum amount of budget to a set of major aggregate travel websites. While this is a
hypothetical example, it is realistic and can be readily applied to similar scenarios.
Our method handles such scenarios using the extension described in Section 2.3.2. Imag-
ine NCL wants to allocate at least twenty percent of any given budget to eight major ag-
gregate websites (CheapTickets.com, Expedia.com, Hotwire.com, Kayak.com, Orbitz.com,
Priceline.com, Travelocity.com, and TripAdvisor.com). We require our optimization to place
at least 2.5 percent of the budget at each of these eight sites.
We follow the same procedure as in the previous case study to obtain the 500 most
visited websites in January 2011 that supported online display advertisements. These 500
websites are also divided into sixteen categories and assigned an average CPM based on
their category. 48,628 users visited at least one of these 500 websites during January 2011,
meaning our Z matrix is 48,628 by 500. We again divide this data into a 10% subset (4,863
users) of calibration data and use the remaining 90% as holdout data.9
Figure 5 demonstrates our reach curves under this extension. We refer to the optimization
with mandatory media coverage of aggregate travel sites as constrained optimization (in dash
blue), and the standard optimization approach as unconstrained (in solid black). We also
include a naive method, allocating the entire budget evenly to the eight aggregate sites (in
dotted green).
The curves on the left show the calculation of reach using the entire data set, i.e. the
full 90% holdout data. As we expect, the unconstrained curve performs slightly better
9We omit the website category makeup description of this application due to its similarity to Table 1 andpage limits. It is available from authors upon request.
25
0.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
Budget (in millions)
Rea
ch b
ased
on
Ove
rall
Dat
a
0.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Budget (in millions)
Rea
ch b
ased
on
Trav
el U
sers
Sub
set
UnconstrainedConstrainedEqual
Figure 5: Reach with Mandatory Coverage in Aggregate Travel Sites
than the constrained curve, since we cannot do better in overall reach by constraining our
optimization. In addition, the naive approach performs poorly. Because the aggregate travel
websites do not reach a majority of the users of the data set, allocating budget only to these
eight websites will naturally limit the ad’s exposure to all Internet users.
The curves on the right show the reach for the subset of users who visited at least one of
the eight aggregate travel websites in January 2011 (there are 6,431 such individuals in our
data set). Presumably these consumers are more likely to be interested in searching for travel
deals compared to the others. In this case, the constrained curve significantly outperforms
the unconstrained curve. By constraining the optimization to allocate a percentage of the
budget to each aggregate travel website, we reach far more of the users who actually visit
these sites, which is the group NCL would like to target. In this case, NCL can meet its
aggregate travel site requirements without sacrificing much overall reach, meaning that most
26
users will still view the ad in general, but we are also confident that we have reached the
subset of people most likely to book a cruise.
In the right panel of Figure 5, the naive approach of equal allocation across the eight
travel aggregate sites performs slightly better than the proposed method when the reach is
calculated based on the subset of aggregate travel site users (i.e., constrained reach). But
this result is expected. NCL is most likely to reach users on the aggregate sites by putting
as much budget as possible into those eight sites. As we see from the overall reach curves on
the left, that method will not capture users on other websites who might also be attracted
to NCL’s Wave Season campaign but did not visit one of the eight aggregate travel websites.
Depending on whether the firm wants to reach a broader audience or a targeted audience,
either the constrained or the unconstrained optimization could be employed in such online
ad campaigns.
4. Conclusion and Future Work
In the current advertising climate, firms need an online presence more than ever. Never-
theless, the ever-increasing number of websites presents not only endless opportunities but
also tremendous challenges for firms’ online display ad campaigns. While online advertising
is limited only by the sheer number of websites, optimal Internet media selection among
thousands of websites presents a prohibitively challenging task in the modern era.
While existing methods can only solve Internet budget optimization for moderately-sized
problems (e.g. 10 websites), we propose a method that allows firms to efficiently allocate
budget across a large number of websites (e.g. 5000). We demonstrate the applicability
and scalability of our algorithm in real-world settings using the comScore data on Inter-
net usage. We also illustrate that the proposed method extends easily to accommodate
common practical Internet advertising considerations, including targeted consumer demo-
graphics, mandatory media coverage to matched content websites, and target frequency of
ad exposures. Furthermore, the low computational cost means that the proposed method
can rapidly examine a range of possible budgets. As a result, firms can easily examine the
correspondence between budget and reach, providing them with the ability to spend only as
much money as required to achieve a desired level of reach.
27
Consequently, the proposed method provides firms with great flexibility and adaptabil-
ity in their online display advertising campaigns. Accordingly, firms can obtain ultimate
control of their own Internet display ad campaigns, alleviating their need to turn to ad agen-
cies or other conglomerate advertising exchanges with little to no oversight over their own
campaigns.
Our research also offers some promising avenues for further research. For example, while
the proposed method emphasizes maximizing the reach of online display ads, firms could
readily modify our approach and use Internet browsing-tracking data to maximize click-
through and/or downstream purchases of their Internet display ad campaigns. Additionally,
in the current paper, we consider the perspective of an individual firm that wishes to maxi-
mize reach for its particular campaign. This method could be further extended for use by an
advertising broker who wishes to maximize reach over a set of clients. Advertising brokers
must provide clients with the best possible campaigns but also use as much of their existing
ad space inventory as possible. Thus an interesting extension of our method would be to
maximize over multiple campaigns from the perspective of an advertising agency. We will
leave such endeavors for future work.
References
360i (2008). Point of view on gaming. Technical report.
Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much
larger than n. The Annals of Statistics, 35(6):2313–2351.
Chapman, M. (2009). Digital advertising’s surprising economics. Adweek, 50(10):8.
Chen, Y., Pavlov, D., and Canny, J. (2009). Proceedings of the 15th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and Data Mining: Large-scale behavioral
targeting.
Cho, C. and Cheon, H. (2004). Why do people avoid advertising on the internet? Journal
of Advertising, 33(4):89–97.
Danaher, P. (2007). Modeling page views across multiple websites with an application to
internet reach and frequency prediction. Marketing Science, 26(3):422–437.
28
Danaher, P., Janghyuk, L., and Kerbache, L. (2010). Optimal internet media selection.
Marketing Science, 29(2):336–347.
Drewnowski, A. and Darmon, N. (2005). Food choices and diet costs: an economic analysis.
The Journal of Nutrition, 135(4):900–904.
Efron, B., Hastie, T., Johnston, I., and Tibshirani, R. (2004). Least angle regression (with
discussion). The Annals of Statistics, 32(2):407–451.
eMarketer (2012). Digital ad spending tops 37 billion. URL:
http://www.emarketer.com/newsroom/index.php/digital-ad-spending-top-37-billion-
2012-market-consolidates. Accessed 4 Jun 2015.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its
oracle properties. Journal of the American Statistical Association, 96(456):1348–1360.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized
linear models via coordinate descent. Journal of Statistical Software, 33(1):302–332.
Goeman, J. (2010). L1 penalized estimation in the cox proportional hazards model. Bio-
metrical Journal, 52(1):70–84.
Goldfarb, A. and Tucker, C. (2011). Online display advertising: Targeting and obtrusiveness.
Marketing Science, 30(3):389–404.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning:
Data Mining, Inference, and Prediction. Springer, second edition.
Hemphill, T. (2000). Doubleclick and consumer online privacy: An e-commerce lesson
learned. Business and Society Review, 105(3):361–372.
Hesterberg, T., Choi, N., Meier, L., and Fraley, C. (2008). Least angle and l1 penalized
regression: A review. Statistics Surveys, 2:6193.
Hoban, P. R. and Bucklin, R. E. (2015). Effects of internet display advertising in the purchase
funnel: Model-based insights from a randomized field experiment. Journal of Marketing
Research, LII:375–393.
29
Krugman, H. (1972). Why three exposures may be enough. Journal of Advertising Research,
12(6):11–14.
Liaukonyte, J., Teixeira, T., and Wilbur, K. (2015). Television advertising and online shop-
ping. Marketing Science, 34(3):311–330.
Lipsman, A. (2010). The New York Times ranks as top online newspaper according to May
2010 U.S. comScore Media Metrix data. Technical report, ComScore, Inc.
Lothia, R., Donthu, N., and Hershberger, E. (2003). The impact of content and design
elements on banner advertising click-through rates. Journal of Advertising Research,
43(04):410–418.
Luo, Z. and Tseng, P. (1992). On the convergence of the coordinate descent method for
convex differentiable minimization. Journal of Optimization Theory and Applications,
72(1):7–35.
Manchanda, P., Dub, J., Goh, K., and Chintagunta, P. (2006). The effect of banner adver-
tising on internet purchasing. Journal of Marketing Research, 43:98–108.
Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data Analysis, pages
374–393.
Mintel (2014). Kids as influencers–U.S. Technical report, Mintel.
Montgomery, A. L., Li, S., Srinivasan, K., and Liechty, J. (2004). Modeling online browsing
and path analysis using clickstream data. Marketing Science, 23(4):579–595.
Morrison, M. (2012). Can the Mcrib save Christmas? Ad Age.
Muthukrishnan, S. (2009). Ad exchanges: Research issues. Technical report, Google, Inc.
Naples, M. (1979). Effective frequency: The relationship between frequency and advertising
effectiveness. Technical report, Association of National Advertisers, New York.
Park, Y. and Fader, P. (2004). Modeling browsing behavior at multiple websites. Marketing
Science, 23(3):280–303.
30
Radchenko, P. and James, G. (2008). Variable inclusion and shrinkage algorithms. Journal
of the American Statistical Association, 103(483):1304–1315.
Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. The Annals of
Statistics, 35(3):1012–1030.
Satchell, A. (2011). Norwegian: Cruise fares to increase up to 10 percent April 1. South
Florida Sun-Sentinel.
Schlesinger, R. (2010). U.S. population, 2011: 310 million and growing. U.S. News.
Schmidt, M., Fung, G., and Rosales, R. (2007). Fast optimization methods for l1 regular-
ization: A comparative study and two new approaches. Machine Learning: ECML 2007,
4701:286–297.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society, Series B, 58:267–288.
Unit, T. E. I. (2005). Business: The online ad attack. The Economist, 375(8424):63.
Zhao, P., Rocha, G., and Yu, B. (2009). The composite absolute penalties family for grouped
and hierarchical variance selection. The Annals of Statistics, 37(6A):3468–3497.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American
Statistical Association, 101:1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society, Series B, 67:301–320.
31
Appendix A Simple Illustration of Correlation in Website View-
ership
In this appendix, we provide a simple illustration of how the proposed method handles
correlation in the Z data matrix. To demonstrate basic intuitions, we illustrate the effects
of correlation on budget allocation by considering a case with three websites, all generated
from the same distribution with the same cost. However, the viewership for websites 1 and
2 has a measurable correlation ranging from 0.0 (fully independent) to 1.0 (perfect positive
correlation), and website 3’s viewership is generated entirely independently of the other two
websites (correlation of 0).
Figure A1 shows the change in budget allocation across the three websites as the cor-
relation between websites 1 and 2 changes, where the red line is website 1’s allocation, the
blue line is website 2’s allocation, and the green line is website 3’s allocation. When the
correlation between websites 1 and 2 is zero, all three websites are completely independent.
In this case, the algorithm allocates one-third of the budget to each of the three websites,
since no website has a clear advantage over the other two. As the correlation between web-
sites 1 and 2 increases, the algorithm gradually allocates more budget to website 3 and split
the remaining budget among websites 1 and 2. When these two websites become perfectly
correlated, the algorithm divides the budget in half, allocating one half to website 3 and the
other half across websites 1 and 2.
32
0.0 0.2 0.4 0.6 0.8 1.0
0.25
0.30
0.35
0.40
0.45
0.50
Correlation Between Websites 1 and 2
Pro
port
ion
of B
udge
t Allo
cate
d
Website 1Website 2Website 3
Figure A1: Illustration of Budget Allocation with Varying Correlations in Website Viewer-
ship
Appendix B Algorithm Details, Convergence, and Efficiency
Our optimization criterion of Equation (7) in Section 2 can be written in statistical form as
an ℓ1 penalty:1
n
n∑
i=1
e−γi +λ
n‖w‖1, (A1)
where ‖w‖1 =∑p
j=1 |wj|. This is then in a common statistical form, namely,
f(w) = g(w) +
p∑
j=1
kj(wj), (A2)
where g(w) = 1n
∑n
i=1 e−γi is a differentiable convex function ofw, and
∑p
j=1 kj(wj) =λn‖w‖1
is a separable convex but not differentiable function.
33
It has been shown in Luo and Tseng (1992) that a coordinate descent algorithm, which
iteratively minimizes the criterion as a function of one coordinate at a time, will achieve
a global minimum for functions of the form in (A2). Thus convergence using coordinate
descent is guaranteed for our criterion, since it is in the form specified by Luo and Tseng.
Because no closed-form solution exists for Equation (7), we employ a Taylor approxima-
tion to (A2), resulting in Equation (9) from Section 2. To minimize Equation (9) over wj,
with all wk k 6= j fixed, we first compute the partial derivative with respect to wj which is
given byn
∑
i=1
θije−
∑j θijw̃j [−1 + θij(wj − w̃j)] + λ (A3)
for wj > 0. Setting (A3) equal to zero gives Equation (10) in Section 2.
We can also use Equation (A3) to find a starting point for our algorithm, i.e., the λ
value corresponding to B = 0. To do this, we employ the same procedure of calculating Hj
as used in Equation (10). In Equation (10), Hj measures whether our algorithm has set a
coefficient to drop below zero. We can use this same procedure to initialize the first λ at which
B = 0. In particular, we first define w̃j = 0 for all j = 1, . . . , p, which corresponds to zero
budget. Then, we calculate Hj for each website and set our initial λ value to maxHj, with
j = 1, . . . , p. To calculate increasing budgets, we use this value as λmax and incrementally
decrease λ by steps. The step size and number of steps are both parameters of the algorithm
and are thus specified by the researcher depending on desired granularity and maximum
budget. For example, for the McRib case study in Section 3.3, we ran the algorithm with
500 steps at a step size of 0.01.
We further demonstrate the efficiency of the algorithm in Figure A2. It shows the time
(in seconds) to run the algorithm at a particular budget over a range of p websites. To create
Figure A2, we used the same method as described in Section 3.2 to generate the Z matrix
and the CPMs. As evident, this figure shows that our algorithm is highly computationally
efficient for large scale problems.
34
0 5 10 15
010
0020
0030
0040
0050
00
Time to run (in seconds)
Num
ber
of W
ebsi
tes
Figure A2: Algorithm Computational Efficiency
Appendix C Supplementary Information for Model Comparisons
with Danaher et al.’s Model
We first describe how we generate the Z matrix using the multivariate negative binomial
marginal distribution as described in Danaher et al. (2010)’s method. Under Danaher et
al.’s approach, page impressions are equivalent to ad appearances; thus what Danaher et al.
refer to as X is equivalent to the proposed Z in our paper To keep terminology consistent,
we will use Zj for the methodology in this section.
We first generate website 1’s data, Z1, from a typical negative binomial distribution (i.e.,
the marginal f1(Z1) distribution). Then, from Danaher 2007 (p. 425) we note that the
conditional distribution of Z2 given Z1 is given by f(Z2|Z1) =f(Z1,Z2)f(Z1)
, or
35
f(Z2 = z2|Z1 = z1) = f2(z2)
[
1 + ω
(
e−z1 −
(
α1
1− e−1 + α1
)r1)(
e−z2 −
(
α2
1− e−1 + α2
)r2)]
(A4)
We then use the following approach to generate the Z matrix following Danaher’s method-
ology:
1. Randomly generate n synthetic respondents from a negative binomial distribution,
corresponding to Z11, . . . , Zn1.
2. For each Zi1 randomly generate Zi2 by sampling from the probability distribution given
by (A4).
3. Repeat process for Z3, Z4, etc. until desired number of websites is reached.
Note the calculation of the conditional f(Zj|Z1, ..., Zj−1) will become increasingly com-
plex as each successive website’s viewership is calculated. For example, f(Z1, Z2, Z3) =
f(Z1)f(Z2|Z1)f(Z3|Z1, Z2). Thus we extend to seven websites for the example used in Sec-
tion 3.1. By combining these vectors, we can create the Z matrix based on the multiple
negative binomial distribution.
To make our simulated data as realistic as possible, we generate the simulated data using
values of α and r estimated from the top 7 most visited website from the December 2011
comScore data as the true parameter values of the MNBD. Since E(Zj) = rj/αj , we have
r̂j/α̂ = Z̄j or alternatively r̂j = Z̄jα̂j . We can find Z̄j easily from the data, as it is simply
the mean of the visit values for a particular website j. Further, given that the probability of
a NBD random variable taking on the value zero is given by (αj/(1+αj))rj , we can estimate
α̂j as the solution to
yj = (α̂j/(1 + α̂j))Z̄j α̂j , (A5)
where yj denotes the observed fraction of zero visits to a given website j. Equation (A5) can
be easily solved using a root solving function and in turn rj estimated using r̂j = Z̄jα̂j.
We used this approach to estimate α and r from Amazon, AOL, Edgesuite, Live, MSN,
Weatherbug and Yahoo, which provided the basis for the seven website simulation in Sec-
tion 3.1.1. Table A1 shows a comparison between the estimated and true α and r for the
simulated data. Here, the true values are from the seven previously mentioned websites,
36
while the estimated values are mean values from 100 simulation runs with matrices of 50,000
users each.10 The table also shows the mean squared error between the true and estimated
values over the 100 runs, as well as the mean absolute deviation. It is evident that the
estimated and true α and r values are reasonably close to one another.
Website 1 Website 2 Website 3 Website 4 Website 5 Website 6 Website 7
α 0.187 0.017 0.093 0.038 0.043 0.025 0.032
α̂ 0.187 0.018 0.093 0.039 0.043 0.025 0.033
MSE 2e−5 2e−5 4e−5 6e−5 6e−5 3e−5 3e−5
MAD 0.008 0.001 0.004 0.001 0.002 0.002 0.002
r 0.287 0.056 0.174 0.093 0.167 0.051 0.444
r̂ 0.287 0.057 0.174 0.093 0.168 0.051 0.445
MSE 8e−5 7e−5 2e−5 8e−5 6e−5 6e−5 7e−5
MAD 0.009 0.003 0.005 0.004 0.005 0.004 0.006
Table A1: True and Estimated Mean α, r Values, Simulated Data
Table A2 shows a comparison between the estimated and full α and r for the seven-
website data from comScore (Section 3.1.2), where the full values are values based on the
entire December 2011 comScore data set, and the estimated values are the mean values across
100 runs on random 10% subsets. The table also shows the mean squared error between the
full and estimated values over the 100 runs, as well as the mean absolute deviation. Again,
the estimated values based on the subset data highly resemble the values obtained from the
full data.
10Note the simulation used in Section 3.1 is done with 5,000 synthetic respondents due to the computationalcomplexity involved in estimating Danaher et al.’s method for 50,000 synthetic respondents.
37
Amazon AOL Edgesuite Live MSN Weatherbug Yahoo
Full α 0.187 0.017 0.093 0.038 0.043 0.025 0.032
Estimated α 0.188 0.017 0.094 0.038 0.043 0.025 0.032
MSE 2e−5 2e−6 3e−5 8e−6 6e−6 4e−6 2e−6
MAD 0.010 0.001 0.005 0.002 0.002 0.002 0.001
Full r 0.287 0.056 0.174 0.093 0.167 0.051 0.444
Estimated r 0.288 0.056 0.175 0.093 0.167 0.051 0.444
MSE 1e−4 4e−6 3e−5 1e−5 2e−5 4e−6 1e−4
MAD 0.010 0.002 0.005 0.003 0.004 0.002 0.008
Table A2: True and Estimated Mean α, r Values, Real Data
Appendix D Website Category Viewership Correlation Table
Table A3 provides an overview of correlation in viewership among the 16 website groups
in McRib example, both within groups and among groups. Within group correlation in
the table is calculated by taking the mean of all absolute correlations between websites
in a particular group. These are displayed in the diagonal of the table. For example, the
Newspaper category shows moderately high average correlation in viewership among websites
with a value of 0.48. In contrast, there is not much correlation in viewership among websites
in the E-mail category, only 0.01 on average.
The off-diagonal elements of Table A3 show the maximum absolute correlation between
each pair of groups. This is calculated by taking the maximum correlation between two
websites from the respective groups. For example, there is a high correlation of 0.96 between
Newspaper and Portal sites. In contrast, there is a low correlation between Filesharing and
E-mail sites, only 0.03.
38
Category Com Email Ent File Game Gen Info News Onl Photo Port Ret Serv Soc Sport Travel
Community 0.02 0.14 0.82 0.14 0.77 0.14 0.47 0.16 0.55 0.88 0.21 0.39 0.21 0.26 0.12 0.15
Email . 0.01 0.07 0.03 0.28 0.04 0.07 0.05 0.09 0.04 0.87 0.10 0.10 0.06 0.12 0.04
Entertainment . . 0.02 0.78 0.32 0.90 0.76 0.92 0.28 0.83 0.90 0.30 0.69 0.24 0.79 0.10
Fileshare . . . 0.05 0.27 0.05 0.15 0.56 0.67 0.13 0.17 0.10 0.13 0.14 0.10 0.07
Gaming . . . . 0.01 0.12 0.82 0.32 0.85 0.12 0.25 0.14 0.95 0.09 0.51 0.09
General News . . . . . 0.28 0.76 0.94 0.08 0.04 0.96 0.08 0.10 0.34 0.85 0.11
Information . . . . . . 0.02 0.77 0.51 0.18 0.76 0.30 0.11 0.24 0.65 0.27
Newspaper . . . . . . . 0.48 0.10 0.05 0.96 0.36 0.12 0.26 0.86 0.15
Online Shop . . . . . . . . 0.03 0.49 0.16 0.26 0.75 0.42 0.19 0.10
Photos . . . . . . . . . 0.02 0.11 0.09 0.09 0.41 0.04 0.05
Portal . . . . . . . . . . 0.06 0.19 0.19 0.12 0.87 0.09
Retail . . . . . . . . . . . 0.04 0.19 0.18 0.25 0.12
Service . . . . . . . . . . . . 0.01 0.15 0.19 0.05
Social Network . . . . . . . . . . . . . 0.02 0.10 0.26
Sports . . . . . . . . . . . . . . 0.07 0.08
Travel . . . . . . . . . . . . . . . 0.18
Table A3: Overview of viewership correlation within and across the sixteen website categories in Section 3.3.
39