a likelihood approach to estimating market equilibrium ... · dogeneity through simultaneous...
TRANSCRIPT
A Likelihood Approach to Estimating Market Equilibrium Models 1
Michaela DraganskaStanford University
Graduate School of BusinessStanford, CA 94305-5015
draganska [email protected]
Dipak JainKellogg School of Management
Northwestern UniversityEvanston, IL 60208-2001
Abstract
This paper develops a new likelihood-based method for the simultaneous estimation ofstructural demand-and-supply models for markets with differentiated products. We specifyan individual-level discrete choice model of demand and derive the supply side assumingmanufacturers compete in prices. The proposed estimation method considers price en-dogeneity through simultaneous estimation of demand and supply, allows for consumerheterogeneity, and incorporates a pricing rule consistent with economic theory.
The basic idea behind the proposed estimation procedure is to simulate prices andchoice probabilities by solving for the market equilibrium. By repeating this many times,we obtain an empirical distribution of equilibrium prices and probabilities. The empiricaldistribution is then smoothed and used in a likelihood procedure to estimate the parametersof the model. The advantage of this method is that it avoids the need to perform atransformation of variables. If consumers’ tastes are independent across market periods,our approach yields maximum-likelihood estimates; otherwise it yields consistent but notfully efficient partial likelihood estimates.
Key Words: price endogeneity, competitive strategy, maximum likelihood.
1 Introduction
In recent years marketers have become increasingly interested in estimating structural market
equilibrium models, where demand is derived from utility maximization on the part of con-
sumers, and the supply side is obtained by assuming that firms maximize profits given the
characteristics of the market. Because competitive environment (i.e., market structure) and
policy variables (i.e., marketing mix) are specified explicitly, we can identify separate demand,
cost, and competitive effects. Estimating a market equilibrium model enables us to analyze
questions pertaining to firms’ strategies in the marketplace through “what-if” type analyses by
taking into account all interdependencies between the demand and supply sides of the market.
The simultaneous estimation of demand and supply is also motivated by the so-called
endogeneity problem. In short, endogeneity arises because marketing variables not only affect
consumer choice, but because consumer choice also affects marketing mix decisions. It has
been well documented that ignoring endogeneity leads to biased coefficient estimates of the
marketing mix variables and therefore to suboptimal decisions (Besanko, Gupta and Jain 1998,
Villas-Boas and Winer 1999).
It is often argued that the use of individual-level data solves the endogeneity problem, since
individuals are price takers. However, even though price is exogenous in a microeconomic sense,
there still might be important correlations between the price and the error term in the demand
equation, thus leading to econometric endogeneity (Kennan 1989). Product attributes that
are unobservable to the researcher such as coupon availability, national advertising and shelf
space allocation have an impact on consumer utility as well as on price setting decisions by
firms (Villas-Boas and Winer 1999, Besanko, Dube and Gupta 2003). Prices should thus be
viewed as endogenous independent of the aggregation level of the data used in the analysis.
In this research, we focus on developing a new likelihood based method for the estimation of
structural demand-and-supply models. Our demand model falls into the broad class of discrete
1
choice models of markets for differentiated products (Anderson, de Palma and Thisse 1992).
The supply model is derived from the profit maximization behavior of the firms, assuming
Bertrand-Nash competition in prices between manufacturers. Market equilibrium is determined
jointly by the demand and supply specifications, and our estimation procedure accordingly
considers the equilibrium equations simultaneously.
Once the presence of unobserved product attributes is acknowledged, it is no longer possi-
ble to estimate a discrete choice model using traditional maximum likelihood methods because
in this case prices will be correlated with the unobservables due to the strategic price-setting
behavior of firms (Berry 1994). Therefore, choice probabilities depend on the unobserved prod-
uct attributes not only directly but also indirectly via prices. Hence, one cannot integrate the
unobserved product attributes out of the choice probabilities without taking this latter depen-
dency into account. Berry (1994) proposed a technique for the estimation of discrete choice
models using instrumental variables to account for the endogeneity of prices. His approach
is easy to implement and has been widely applied to the analysis of aggregate data (Berry,
Levinsohn and Pakes 1995, Besanko et al. 1998, Nevo 2001).
Marketing researchers, however, have long recognized the advantages of data describing
the purchase behavior of individual consumers. Such disaggregate scanner panel data provide
detailed information that can be used to learn about their preferences. For example, they enable
us to understand the source of behaviors such as variety seeking or deal proneness. Given the
richness of scanner panel data, a large literature has evolved that uses them to estimate discrete
choice models of consumer behavior (Guadagni and Little 1983, Kamakura and Russel 1989,
Chintagunta, Jain and Vilcassim 1991, Gonul and Srinivasan 1993, Fader and Hardie 1996).
These models have focused on estimating the demand side and have not considered the possible
presence of endogeneity. Recently Goolsbee and Petrin (2003) and Chintagunta, Dube and Goh
(2003) apply variants of Berry’s (1994) method to estimate consumer demand using individual-
2
level choice data. These approaches are useful when the main interest lies in obtaining precise
demand-side estimates because they provide a way to account for price endogeneity without
the need of making assumptions about supply-side behavior. If conducting policy experiments
is our goal, however, then estimating an equilibrium model is preferable, since it enables us to
take advantage of the cross-equation dependencies of the structural parameters.
An equilibrium model provides a mapping from unobserved product attributes and cost
shocks to market outcomes, i.e., prices and choice probabilities. Extending traditional MLE
methods to include a supply side in addition to a consumer choice model is not straightforward
because it requires that the researcher is able to write down the joint distribution of these
equilibrium outcomes. Assuming that this distribution is known runs counter to the notion of
an equilibrium (Berry 1994). Hence, the joint distribution of these equilibrium outcomes needs
to be derived from the distribution of the unobservables. Performing this transformation of
variables proves to be very difficult due to the highly nonlinear nature of the model. Villas-
Boas and Winer (1999) circumvent this problem by estimating a reduced-form pricing rule that
relates current prices to lagged prices. In a subsequent article, Villas-Boas and Zhao (2001)
specify a structural supply-side model derived from manufacturers’ and retailers’ optimization
problem and estimate the equilibrium model directly using maximum likelihood. This direct
approach to estimating the Jacobian, however, prevents them from incorporating consumer
heterogeneity. Recently, Yang, Chen and Allenby (2003) have proposed a Bayesian approach
to resolve the issue.
In this article, we propose a likelihood based approach to the estimation of a structural
demand-and-supply model using individual-level choice data. The basic idea behind the pro-
posed estimation procedure is to simulate prices and probabilities by randomly drawing the
shocks from an assumed joint distribution, and then solve for the equilibrium. By repeating
this many times, we obtain an empirical distribution of equilibrium prices and probabilities.
3
The empirical distribution is then smoothed and used in a maximum-likelihood procedure to
estimate the parameters of the model. The advantage of this method is that it avoids the need
to perform a transformation of variables and thus enables us to estimate the model when the
evaluation of the Jacobian seems infeasible.
In computing the likelihood of the data, we treat market periods as independent from each
other. This implicitly assumes that there is no persistence in the preferences of consumers
across market periods.2 If markets are geographical regions rather than time periods, then
this assumption is warranted. Furthermore, the psychology literature suggests that consumers’
preferences change over time, often depending on contextual effects that are unobserved by the
econometrician (Petty and Cacioppo 1986, Burnkrant and Unnava 1995). To the extent that
this leads to independence over time, our procedure yields maximum-likelihood estimates of
the model parameters. If, on the other hand, there is a correlation in consumers’ preferences
over time, then our procedure yields a so-called partial likelihood (Wooldridge 2002), and the
resulting estimates are consistent but not fully efficient.
The remainder of the paper is organized as follows. Section 2 develops the equilibrium
model. In Section 3, the estimation procedure is described along with the details of the im-
plementation. In Section 4, we apply the estimation method to two frequently purchased
consumer products, yogurt and laundry detergent. We demonstrate the accuracy of the pro-
posed procedure in a Monte Carlo study presented in Section 5. In Section 6 we conclude with
a summary and directions for future research.
2 Model Formulation
2.1 Demand Specification
Brands are indexed by j = 0, . . . , J , and market periods by t = 1, . . . , T . Let household
types be indexed by n = 1, . . . , N , where a type denotes a set of households with identical
4
demographic characteristics. There are mn individuals of type n. To capture unobserved
consumer heterogeneity, we use a latent class approach and specify random coefficients with
an L-point distribution (Kamakura and Russel 1989).3 This specification is appealing in terms
of interpretability for marketing purposes and has been applied both in the economics and
marketing literature (Berry, Carnall and Spiller 1997, Besanko et al. 2003). Let the latent
market segments be indexed by l = 1, . . . , L. The share of segment l in the population is
λl ≥ 0, where∑L
l=1 λl = 1.
Consumer behavior is governed by the following utility function:
un0t = εn0t,
unjt = xnjtβl − αlpjt + ξjt + εnjt,
where {εn0t, . . . , εnJt} are iid extreme value distributed, xnjt are observed characteristics of an
alternative or decision-maker, and pjt denotes the price of alternative j in period t. βl and αl are
the respective response parameters. We allow for household-specific variation in these response
parameters to capture consumer heterogeneity. The demand shocks {ξ1t, . . . , ξJt} are common
across consumers and represent product characteristics that are unobserved by the researcher,
but are taken into account by the firms in their pricing decision. While some unobserved
product characteristics, such as quality and brand image, can be captured through the inclusion
of brand-specific constants, ξjt reflects time-varying factors like coupon availability, shelf space,
and national advertising.
Brand 0 is the outside good (i.e., no-purchase alternative). Including an outside good
allows for category expansion effects of marketing actions. We assume that the outside good
is non-strategic, i.e., its price is not set as a best response to the inside goods.
Utility maximization and the assumptions on the error term imply that the probability of
5
household n purchasing brand j in market period t, Dnjt, is given by
Dnjt =L∑
l=1
λlDnljt =L∑
l=1
λlexp(xnjtβl − αlpjt + ξjt)
1 +∑J
k=1 exp(xnktβl − αlpkt + ξkt)(1)
and the probability of the outside good being chosen is
Dn0t =L∑
l=1
λlDnl0t =L∑
l=1
λl1
1 +∑J
k=1 exp(xnktβl − αlpkt + ξkt). (2)
2.2 Supply Specification
The supply side is characterized by Bertrand-Nash behavior on part of oligopolistic firms (Berry
et al. 1995, Besanko et al. 1998). We assume that retailers pass through the manufacturers’
decisions, which is likely to hold for categories that do not have strategic impact on store traffic
or are a primary driver of retailers’ profits. Under this assumption we do not need to explicitly
include a retailer in the supply-side model.
The production function has constant returns to scale. Marginal costs for firm j in period
t are denoted by cjt. In market period t firm j maximizes profits,
maxpjt
Πjt = (pjt − cjt)N∑
n=1
mnDnjt, (3)
where∑N
n=1 mnDnjt is the expected demand for product j in period t. Expected demand is
thus given by the weighted sum of the choice probabilities for all consumer types in the market.
The first order condition for this problem is given by
N∑
n=1
mn∂Dnjt
∂pjt(pjt − cjt) +
N∑
n=1
mnDnjt = 0. (4)
Given our demand model, the above equation can be rewritten as
pjt = cjt +∑N
n=1 mn∑L
l=1 λlDnljt∑Nn=1 mn
∑Ll=1 λlαlDnljt(1−Dnljt)
. (5)
We infer marginal cost from the data using the relationship
cjt = wtγj + ηjt, (6)
6
where wt are observable variables, e.g. input prices, and ηjt denotes cost characteristics that
are unobserved by the researcher. Substituting (6) in (5) yields
pjt = wtγj +∑N
n=1 mn∑L
l=1 λlDnljt∑Nn=1 mn
∑Ll=1 λlαlDnljt(1−Dnljt)
+ ηjt, j = 1, . . . , J. (7)
2.3 Market Equilibrium
Considering the demand equations (1) and supply equations (7) jointly, the market equilibrium
is defined by
Dnjt =L∑
l=1
λlexp(xnjtβl − αlpjt + ξjt)
1 +∑J
k=1 exp(xnktβl − αlpkt + ξkt), j = 1, . . . , J, n = 1, . . . , N, (8)
pjt = wtγj +∑N
n=1 mn∑L
l=1 λlDnljt∑Nn=1 mn
∑Ll=1 λlαlDnljt(1−Dnljt)
+ ηjt, j = 1, . . . , J. (9)
In equilibrium, prices and probabilities depend on both {ξjt}j and {ηjt}j . Hence, estimat-
ing the equations separately leads to a simultaneity bias: both in equation (8) and (9), the
explanatory variables are correlated with the unobserved errors. Consider equation (9) and
suppose that firm j faces a high cost shock ηjt in period t. This will lead the firm to charge
a higher price pjt which in turn decreases the probability that it will be chosen by consumers
of type n, that is, Dnjt decreases. Consequently, the regressorPN
n=1 mnPL
l=1 λlDnljtPNn=1 mn
PLl=1 λlαlDnljt(1−Dnljt)
changes with the error ηjt, and this correlation leads to biased estimates for α. A joint esti-
mation of (8) and (9) would account for such possible correlation and hereby lead to a valid
estimate of the price coefficient αl.
3 Estimation Procedure
In this section we develop a maximum likelihood-based procedure to obtain estimates of the
structural parameters. The model can be written in the general form of a response function,
where the endogenous variables are expressed as a function of the exogenous variables. That
7
is, for each market period t, if equilibrium is unique, we have
[{Dnjt}n,j , {pjt}j ] = f [{xnjt}n,j , {wjt}j , {ξjt}j , {ηjt}j , θ], (10)
where θ = ({αl}l, {βl}l, {λl}l, {γj}j , {σξj}j , {σηj}j) is the vector of parameters to be estimated.
As noted previously, the likelihood function of such an equilibrium model is in general
intractable. Consider the equilibrium model as defined by equations (8) and (9), where we set
N = 1 to simplify exposition. For given values of the exogenous variables, the joint distribution
of the demand and supply shocks {ξjt}j and {ηjt}j induces a distribution of the equilibrium
prices {pjt}j and probabilities {Djt}j . The difficulty in writing down the likelihood function
stems from the fact that this induced distribution of prices and probabilities is hard to obtain
directly through a transformation of variables approach. To compute the transformation of the
demand and supply shocks, one would need to solve the system of equilibrium equations for
{ξjt}j and {ηjt}j and then derive the Jacobian of this inverse transformation. That is, there
needs to exist a set of J functions {uj(·)}j that map prices and probabilities into the ξj ’s and
another set of J functions {vj(·)}j that map prices and probabilities into ηj ’s. Let h be the
pdf of ({ξjt}j , {ηjt}j). To obtain the pdf g of ({pjt}j , {Djt}j), the transformation of variables
is
g({pjt}j , {Djt}j) = h(u1({pjt}j , {Djt}j), . . . , uJ({pjt}j , {Djt}j),
v1({pjt}j , {Djt}j), . . . , vJ({pjt}j , {Djt}j)) · |Jac|,
where Jac is the (2J × 2J) Jacobian of the inverse transformation
∂u1∂p1t
. . . ∂u1∂pJt
∂u1∂D1t
. . . ∂u1∂DJt
...... . . .
......
...∂uJ∂p1t
. . . ∂uJ∂pJt
∂uJ∂D1t
. . . ∂uJ∂DJt
∂v1∂p1t
. . . ∂v1∂pJt
∂v1∂D1t
. . . ∂vJ∂DJt
...... . . .
......
...∂vJ∂p1t
. . . ∂vJ∂pJt
∂vJ∂D1t
. . . ∂vJ∂DJt
.
8
The problem is that the equilibrium equations generally cannot be solved to obtain the in-
verse transformations {uj(·)}j and {vj(·)}j . Moreover, even if we could obtain {uj(·)}j and
{vj(·)}j , e.g., using numerical methods, then we would still have to compute the Jacobian
of this (unknown!) inverse transformation. Due to the highly nonlinear model specification,
this is a daunting task. In a recent article, Yang et al. (2003) propose a Bayesian approach
to estimating market equilibrium models. While the authors simplify the transformation of
variables considerably by transforming the supply shocks into prices conditional on demand
shocks, the computation of the Jacobian still needs to be done using numerical methods.4
Our approach is different. We avoid performing the transformation of variables altogether
and instead obtain equilibrium prices and probabilities using simulation. Recall that, for given
values of the exogenous variables, the joint distribution of demand and supply shocks induces
a distribution of prices and probabilities. We exploit this by numerically solving the model
repeatedly for simulated demand and supply shocks. For each draw of the demand and supply
shocks, we obtain the corresponding equilibrium prices and probabilities. Then we compute
the joint distribution of prices and probabilities.
There is, however, no guarantee that the empirical distribution of prices and probabilities
obtained through the simulation will be smooth, which is a property we need for the optimiza-
tion. We therefore employ nonparametric techniques to estimate the joint density of prices and
probabilities, and then evaluate it at the actual data to obtain a smooth, well-behaved likeli-
hood function. The parameter estimates are obtained by maximizing this likelihood function
using an iterative optimization procedure.
Estimation Algorithm
Based on the previous discussion, there are three main components to the proposed estimation
procedure:
(i) simulation of equilibrium prices and probabilities,
9
(ii) estimation of the joint density of prices and probabilities in order to smooth the likeli-
hood function, and
(iii) maximization of the loglikelihood to obtain the parameters of the model.
The estimation algorithm proceeds as follows. Let s = 1, . . . , S index the simulations per
time period t.
Step 1: Draw {ξjt}j,t, and {ηjt}j,t S times.
Step 2: Choose a starting value for θ.
Step 3: Set s = 1.
Step 4: Set t = 1.
Step 5: Using the sth draw solve
pjt = wtγj
+
∑Nn=1 mn
∑l λl
exp(xnjtβl−αlpjt+ξjt)
1+PK
k=1 exp(xnktβl−αlpkt+ξkt)∑Nn=1 mn
∑l λlαl
exp(xnjtβl−αlpjt+ξjt)
1+PK
k=1 exp(xnktβl−αlpkt+ξkt)
{1− exp(xnjtβl−αlpjt+ξjt)
1+PK
k=1 exp(xnktβl−αlpkt+ξkt)
}
+ ηjt (11)
to obtain {pjt}j . Using {pjt}j calculate Dnjt =∑
l λlexp(xnjtβl−αlpjt+ξjt)
1+PJ
k=1 exp(xnktβl−αlpkt+ξkt).
Step 6: Increase s by 1. If s ≤ S, go back to step 5.
Step 7: Estimate the joint density of the calculated prices and probabilities, ϕ({pjt}j , {Dnjt}n,j), at theactual prices and probabilities to get the period t contribution to the loglikelihood.
Step 8: Increase t by 1. If t ≤ T , then go back to step 5.
Step 9: Update θ to maximize the loglikelihood. If convergence is reached, terminate. Else go back to
step 3.
We now discuss the details of the estimation algorithm step by step.
3.1 Simulation of Equilibrium Prices and Probabilities
The first component of the estimation procedure is the simulation of equilibrium prices and
probabilities. We assume that demand and supply shocks are normally distributed,
(ξ1t, . . . , ξJt, η1t, . . . , ηJt) ∼ N(0, diag(σ2ξ1 , . . . , σ
2ξJ
, σ2η1
, . . . , σ2ηJ
)).
10
Further, we assume that {ξjt}j and {ηjt}j are independent across time and independent of
{εnjt}n,j . Since all error terms are independent across time and all maximization problems are
static, we can treat each period separately.
In step 1, we draw the errors only once and use them for all values of θ. McFadden (1989)
shows in the context of method of moments estimation that using the same set of random
draws to simulate the model at different trial parameter values helps to avoid “chattering” of
the simulator, i.e., it ensures that the criterion function would not be discontinuous. Pakes and
Pollard (1989) also note that the properties of simulation estimators, and the performance of
the algorithms used to determine them, require the use of simulation draws that do not change
as the optimization algorithm varies θ. We implement this step by drawing from a standard
normal distribution and then multiplying the draws by the standard deviations, σξ1 , . . . , σξJ,
and ση1 , . . . , σηJ .
Given the exogenous variables, a set of parameter values and random shocks, we solve for
the equilibrium prices and probabilities in step 5. Note that equation (11) is differentiable in
{pjt}j , so we use a Newton-Raphson gradient procedure. In accordance with theory, we obtain
a unique solution along with reasonable values for the equilibrium prices.
For each period t, this step is repeated S times to generate an empirical distribution of
equilibrium prices and probabilities for this period. The number of random draws (simulations)
for the computation of the equilibrium points is to be determined by the user. The minimum
number of simulations, however, is set by the requirements of the density estimation procedure
as discussed below.
3.2 Estimation of the Joint Density of Prices and Probabilities
In principle, we could use the empirical distribution of the simulated equilibrium prices and
probabilities to directly compute the likelihood of the data. However, since the empirical
distribution is not smooth, this will in general not lead to a well-behaved likelihood function
11
that is readily optimized. One possibility is to use kernel density estimation to smooth the
simulated data points. In a nutshell, kernel smoothing involves weighted local averaging with
the kernels as weights.
Kernel estimators have two desirable properties: First, they are consistent5; second, by
averaging over a neighborhood that shrinks at an appropriate rate, they achieve the optimal
rate of convergence for nonparametric estimators (Stone 1980). There are other nonparametric
techniques that could be considered for smoothing purposes such as splines and orthogonal
(Fourier) series (Hardle 1990). While these estimators are similar in terms of computational
intensity and are asymptotically equivalent, they may differ in their small sample properties.
However, to the best of our knowledge, there is no comprehensive Monte-Carlo study or any
analytical result that show better small sample properties for any of the these estimators.
Therefore, the choice among them is largely a matter of taste.6
We employ a multivariate kernel density estimator with a multiplicative Gaussian kernel
(Hardle 1990) to evaluate the joint density of the simulated equilibrium prices and probabilities
at the actual data to obtain the contribution to the likelihood (step 7). Unlike prices, the
probabilities are not directly observed but they are nonparametrically estimable from the
data. Formally, the estimated joint density of the calculated prices and probabilities at the
actual data is given by:
ϕ({pjt}j , {Dnjt}n,j) =1S
S∑
s=1
J∏
j=1
1hp
j
K
(ps
jt − pjt
hpj
)J∏
j=1
N∏
n=1
1hD
nj
K
(Ds
njt −Dnjt
hDnj
), (12)
where s indexes simulations, K(·) is the Gaussian kernel defined as K(u) = 1√2π
exp(−u2
2
),
and hpj and hD
nj are smoothing parameters defined below.
One well-known problem of nonparametric density estimation is the so called ‘curse of
dimensionality,’ i.e., the explosion of number of data points needed for estimation of higher-
dimensional densities. The minimum number of data points required for the estimation has
been tabulated in Silverman (1986, Table 4.2). Because we simulate the equilibrium prices and
12
probabilities, this is only a computational issue in the present case: we can simulate as many
data points as needed by drawing more errors. For example, if there are two competing brands
in the market, that is, if we want to estimate a four-dimensional density (two prices and two
choice probabilities), we need 223 data points. For three brands, the required sample size is
2790, and for four brands 43700 observations are needed.
Nonparametric density estimation requires the choice of a smoothing parameter, bandwidth,
that governs the degree of smoothness of the density estimate. In essence, the bandwidth
determines how much averaging we want to do around a given point. Naturally, the larger the
bandwidth, the smoother the function. However, with increasing bandwidth, the estimated
density may be far from the true underlying density. If the bandwidth is chosen too small,
however, the obtained density would have a ‘rough’ surface and will not be as easy to use for
optimization. Therefore, the choice of bandwidth is critical.7
We are only interested in the density at the point of the actual data, so we have to choose
the bandwidth to be locally optimal. Intuitively, if we choose the bandwidth too large, the
estimated density is essentially constant throughout the parameter space. Hence, every point is
a global maximum (up to computer precision). On the other hand, if we choose the bandwidth
too small, the estimated density is roughly zero outside a small neighborhood of the observed
values, and every point is a local maximum. Searching for the global maximum is a tantamount
to searching for the small neighborhood of the parameter space that is associated with positive
density.
Recall that we estimate a J(N + 1)-dimensional density. We select the bandwidth for the
ith dimension according to the normal reference rule (Scott 1992),
hi =(
4J(N + 1) + 2
)1/(J(N+1)+4)
σiS1/(J(N+1)+4), (13)
where σi is the standard deviation of the equilibrium prices (probabilities) in the ith dimension.
This rule is known to oversmooth when the underlying density is multimodal. In our case
13
this is welcome, because we want to ensure a well-behaved likelihood function that is easy
to maximize. The drawback is, of course, that the estimated density may be far from the
underlying density. However, this problem can be alleviated by successively decreasing the
bandwidth once we are in the neighborhood of the global maximum.
3.3 Maximization of the Loglikelihood
The maximum likelihood procedure builds an outer loop around the simulation of the equilib-
rium prices and probabilities and the estimation of their joint density. Thus, for each set of
parameter values θ we perform S simulations to obtain the loglikelihood function, which is in
turn maximized to obtain an updated set of parameter values.
Starting values for the parameters in step 2 may come from a preliminary estimation using
3SLS with aggregate data as in Besanko et al. (1998):
lnSjt − ln S0t = xjtβ − αpjt + ξjt, j = 1, . . . , J, (14)
pjt = wtγj +1
α(1− Sjt)+ ηjt, j = 1, . . . , J, (15)
where Sjt is the share of alternative j for week t in the aggregate data. For the standard
deviations, (σξ1 , . . . , σξJ, ση1 , . . . , σηJ ), we set the starting values equal to the root mean squared
error (RMSE) from the 3SLS procedure. Note that in equations (14) and (15) the ξjt and
ηjt enter as linear disturbances. Therefore, the RMSE provides an estimate of the standard
deviation of the errors.
The maximization of the loglikelihood is accomplished by means of a simplex search (Nelder
and Mead 1965). It is clearly infeasible to compute analytic gradients. While it is in principle
possible to use numerical gradients as part of a gradient-based optimization procedure, we
found that a simplex search that uses only the values of the loglikelihood function performs
best.8
14
4 Empirical Analyses
The algorithm described in Section 3 is applied to estimate an equilibrium model of demand
and supply in two product categories, yogurt and laundry detergent. While we account for un-
observed consumer heterogeneity, we abstract from observed heterogeneity, that is, we assume
N = 1.
Data. We use data on individual purchase histories for a panel of households in Sioux Falls,
South Dakota, collected by A.C. Nielsen. The data set spans a period of 114 weeks in 1986-
1988.9 There is information about the dates of the shopping trips of 615 households, who
purchased in the category more than twice, price paid, and item purchased (UPC). We aggre-
gate over all UPCs that belong to a brand. That is, in the yogurt category we aggregate across
different flavors, and in the laundry detergent category, we aggregate across different sizes.
To obtain weekly no-purchase probabilities, we tried out two alternative approaches. The
first was to condition on store visits, i.e., to compute the probability that a household goes
to the store but does not purchase in the category of interest (for details see Besanko et al.
(1998) and Draganska and Jain (2003)). The second approach is to assume that each household
makes a weekly decision whether to purchase in the category or not, i.e., we do not condition
on store visit when computing the no-purchase probability. The empirical results did not differ
qualitatively for the two approaches, so we decided to not condition on store visits.
For the cost shifters in the supply equation we obtained monthly data on labor and materials
prices from the Bureau of Labor Statistics (BLS). Labor costs are represented by average hourly
earnings of production workers for the respective industry (SIC 202, dairy products, for yogurt
and SIC 2841, soap and other detergents, for laundry detergent). We also use data on the
prices of the main ingredient for each of the product categories. Specifically, we obtained data
on the producer price indices for fluid milk (yogurt) and basic inorganic chemicals (laundry
15
Table 1: Descriptive Statistics of Data.Items Avg. Choice Prob. Avg. Price Material cost Labor costYogurtYoplait 0.0654 9.9425 (cent per oz.) 103.3066 9.5512Dannon 0.0377 8.0693 (cent per oz.)No purchase 0.8969
DetergentWisk 0.0192 0.0481 ($ per oz.) 96.0997 14.1633Tide 0.0484 0.0512 ($ per oz.)No purchase 0.9325
detergent). The monthly data series were then smoothed to obtain weekly cost data following
the approach suggested by Slade (1995).
Table 1 presents descriptive statistics for both product categories along with summary
information on the cost data we use for the analysis. In the yogurt category, we focus our
attention on the two major competitors in the single-serving yogurt market, Dannon and
Yoplait (General Mills).10 Yoplait is the market leader, with a market share almost double
that of Dannon and a somewhat higher price.
In the laundry detergent category, we study the competition between the two leading brands
Wisk (Unilever’s flagship brand) and Tide (Procter& Gamble’s premier brand). Tide seems
to dominate Wisk, it has both a higher market share and commands a price premium.
Estimation. We implemented the estimation algorithm in C++ using optimization routines
and routines for solving systems of nonlinear equations from Press, Teukolsky, Vetterling and
Flannery (1993).11 The parameter estimates are obtained as follows. We estimate an aggregate
model as in Besanko et al. (1998) to get starting values. These initial values are perturbed
and used in the optimization program. The output parameters are compared based on the
values of the loglikelihood function. The parameters with the largest loglikelihood are then
perturbed again and taken as input to the optimization procedure. The output parameters are
16
again compared based on the loglikelihood value. The final parameters are chosen to be those
with the largest loglikelihood value. By using different starting values, we make sure that the
optimization algorithm achieves the global maximum.
To compute the standard errors we employ bootstrap. To this end, we simulate 30 data
sets by randomly drawing with replacement from the original data. Note that unobserved
heterogeneity introduces dependencies across the purchases of a given household. To account
for this, in computing the standard errors, it is important to sample entire household histories.12
For each data set, we then run the optimization program. Finally, we compute the standard
errors as the standard deviation of the 30 sets of parameter estimates.
We use 1000 simulation draws to ensure greater precision of the estimates (recall that
for a four-dimensional density we only need 223 data points (Silverman 1986)). Convergence
is reached with up to 300 function calls for the model without heterogeneity and about 600
function calls for the model with heterogeneity. An evaluation of the likelihood function takes
between 2 and 10 seconds depending on the model specification on a Pentium 4 PC with 1GHz
clock speed and 512MB RAM.
Yogurt category. Table 2 presents the results of the empirical analysis. In addition to the
homogenous logit, which is our baseline model, we estimated two heterogeneity specifications:
one where only the price response of the two segments is different (heterogeneity 1), and one
where we also allow for heterogeneity in the brand constants (heterogeneity 2). All estimated
coefficients have face validity. The price coefficients are negative and both the wage rate and
the price of milk have a positive impact on price as expected.
There does not appear to be much qualitative difference in the estimates for the standard
logit and the heterogeneous logit specification, where only price response is allowed to vary
by segment. However, the estimated parameter for the proportion of segment 1, λ = 12%, is
significant. Once heterogeneity in the brand constants is introduced (heterogeneity 2), how-
17
ever, the difference in the estimated parameters relative to the homogenous logit specification
becomes much more pronounced. There is now a sizeable difference in the price sensitivity
between the two segments, with the slightly larger segment (57%) being the less price sensitive
one. AIC and BIC both show considerable improvement as we go from heterogeneity 1 to
heterogeneity 2. The estimated marginal costs are positive, and of reasonable magnitude: 8.91
cents per ounce for Yoplait and 7.08 cents per ounce for Dannon.13
Table 2: Parameter estimates and standard errors for yogurt data.
No Heterogeneity Heterogeneity 1 Heterogeneity 2Variable Coefficient (Std. dev.) Coefficient (Std. dev.) Coefficient (Std. Dev.)Demand Side:Dannon const. (segm. 1) 5.7064 -0.0428 5.8374 -0.0651 0.9759 0.0471Yoplait const. (segm. 1) 8.2857 -0.0278 8.3503 -0.0694 4.1928 0.0552Dannon const. (segm. 2) 10.3163 0.0376Yoplait const. (segm. 2) 6.8511 0.0765σξ1 0.3198 -0.0344 0.3584 -0.0337 0.3047 0.0336σξ2 0.3093 -0.0411 0.4131 -0.0602 0.352 0.03price (segment 1) -1.112 -0.0047 -0.8999 -0.0382 -0.6319 0.0138price (segment 2) -1.2052 -0.0214 -1.6232 0.018proportion of segment 1 0.1239 -0.0253 0.5672 0.053Supply Side:Dannon constant -9.8327 -0.0302 -10.2128 -0.081 -10.3732 0.0542Yoplait constant -7.9526 -0.045 -8.3507 -0.0907 -9.4799 0.0679ση1 0.2827 -0.0311 0.2765 -0.0311 0.2667 0.0268ση2 0.5649 -0.0434 0.5347 -0.0424 0.5178 0.0479labor cost 1.1134 -0.0134 0.9554 -0.0287 0.9725 0.0133material cost 0.615 -0.011 0.7859 -0.0228 0.8151 0.0131Loglikelihood 481.34 491.45 516.38AIC -940.68 -956.91 -1002.76BIC -940.05 -956.17 -1001.91
Laundry Detergent. Table 3 presents the estimation results. With the exception of the
material costs parameter in the standard logit specification (negative but insignificant), all
coefficient estimates have the expected signs. The price coefficients are negative in both spec-
ifications. In terms of price sensitivity, there appear to be two equally sized segments (we
estimate a proportion of 53.5% for segment 1). Labor cost has the expected positive impact
18
on the price of the product in the standard logit model but is not significantly different from
zero in the heterogenous logit specification. The brand-specific constant for Wisk is negative,
while the one for Tide is positive, reflecting the strong inherent preference for Tide. Wisk has
lower marginal cost (3.46 cents per ounce) than Tide (3.75 cents per ounce).
Table 3: Parameter estimates and standard errors for laundry detergent data.
No Heterogeneity With HeterogeneityVariable Coefficient (Std. dev.) Coefficient (Std. Dev.)Demand Side:Wisk brand constant −0.3217 (0.0760) −0.3442 (0.0366)Tide brand constant 0.9444 (0.0620) 0.9222 (0.0398)σξ1 0.4272 (0.0521) 0.4074 (0.0494)σξ2 0.4334 (0.1401) 0.4627 (0.1142)price (segment 1) −0.7710 (0.0176) −0.8740 (0.0373)price (segment 2) −0.6791 (0.0289)proportion of segment 1 0.5358 (0.0749)Supply Side:Wisk cost constant 2.6628 (0.0543) 2.6647 (0.0522)Tide cost constant 2.9299 (0.0589) 2.9036 (0.0646)ση1 0.3734 (0.0413) 0.4102 (0.0289)ση2 0.3418 (0.0301) 0.3203 (0.0152)labor cost 0.1169 (0.0506) 0.0493 (0.0396)material cost −0.0889 (0.0734) 0.0065 (0.0591)Loglikelihood −562.9528 −560.4829
We have now seen that the estimation procedure yields reasonable parameter estimates for
both the yogurt and the laundry detergent category. It still remains unclear, however, whether
our estimator performs well in general. To study its properties, in the next section we turn to
a more thorough investigation through a Monte Carlo experiment.
5 Simulation Analysis
Given the complexity of the proposed algorithm, it is very difficult to determine analytically
its properties. We therefore conducted a Monte Carlo experiment: we generated 50 artificial
19
data sets and applied the estimation procedure to each of them. Since the true underlying
parameters are known, we can compare our estimates to them and draw conclusions about the
performance of our procedure.
Data. We simulated choice data for 114 weeks and 473 households. The assumed ‘true’
parameter values roughly correspond to the ones obtained from a preliminary estimation using
scanner panel data and are listed in Table 4. There are two competing brands and an outside
good in the market with average shares of 2%, 4%, and 94%, respectively. The way the model
is set up, choosing the outside good at time t means not buying at all in week t. That is, for
each household, we have 114 observations. The total number of observations is thus 53,922.
For the supply side, we use factor price data for labor (average hourly wages of production
workers for SIC 209, miscellaneous food and kindred products) and for the key ingredient
in the production process, peanuts. We draw the demand and supply shocks from a normal
distribution. For the standard deviations, (σξ1 , . . . , σξJ, ση1 , . . . , σηJ ), we set the true values
equal to the RMSE from a preliminary 3SLS estimation. The choice and price generation
process is as specified in equations (8) and (9).
Monte Carlo results. We obtained the parameter estimates for each of the 50 Monte Carlo
samples using the algorithm described in Section 3. Table 4 presents the resulting mean,
bias, variance, and mean square error (MSE). The MSE is given by the sum of the squared
bias and the variance. In general, the proposed estimation procedure seems to work quite
well. Specifically, the variances of the parameter estimates are very small, as expected for a
maximum-likelihood based procedure. The magnitude of the biases is large compared to the
variances. It is, however, reassuring that the coefficient of interest, namely the price coefficient,
is estimated with a very high degree of reliability. The bias is only 0.00354, which is tiny
relative to the value of the price coefficient (−0.21) and suggests that our way of dealing with
20
the endogeneity problem is indeed effective. The supply-side parameters (labor and ingredients
cost) also show only a small bias.
Table 4: Monte Carlo results for proposed algorithm.
Variable True Value Mean Bias Variance MSEdemand const. 1 -2.62 -2.49396 -0.12604 0.00809 0.02398demand const. 2 -1.27 -1.14716 -0.12284 0.00554 0.02063price -0.21 -0.21354 0.00354 0.00004 0.00005supply const. 1 -13.48 -13.18638 -0.28362 0.01696 0.09740supply const. 2 -12.12 -11.86734 -0.25266 0.01645 0.08028labor cost 2.03 2.03272 -0.00272 0.00013 0.00014material cost 0.27 0.29996 -0.02996 0.00803 0.00893σξ1 0.61 0.49563 0.11437 0.00260 0.01568σξ2 0.41 0.34294 0.06706 0.00075 0.00525ση1 0.75 0.60453 0.14547 0.00388 0.02504ση2 0.65 0.48901 0.16099 0.05074 0.07666
The performance of the estimator is also excellent when unobserved heterogeneity is con-
sidered. We simulated a data set with two equally sized segments differing in their price sen-
sitivity. Table 5 presents the results of the Monte Carlo experiment for this specification. As
can be seen from the table, the price coefficients are estimated reliably. Overall, the proposed
estimation procedure handles unobserved heterogeneity very well.
Robustness checks. One key assumption we make is that the demand and supply errors are
jointly normally distributed. This does not need to be true in reality, so we test the robustness
of our procedure to different distributional assumptions. Specifically, we assume that ξ and
η are following a mixture of normal and logistic distributions. We give increasingly higher
weights to the logistic distribution to study the effects on the performance of our procedure.
As Table 6 reveals, the results are fairly robust. As expected, when the normality assumption
is satisfied, the MSE is the smallest but even when we draw the errors entirely from a logistic
distribution, the accuracy remains very high.
21
Table 5: Monte Carlo results for model with heterogeneity.
Variable True Value Mean Bias Variance MSEdemand const. 1 -2.62 -2.5948 0.0252 0.0064 0.0070demand const. 2 -1.27 -1.2540 0.0160 0.0077 0.0080price1 -0.15 -0.1517 -0.0017 0.0001 0.0001price2 -0.25 -0.2511 -0.0011 0.0014 0.0014σξ1 0.61 0.5737 -0.0363 0.0033 0.0046σξ2 0.41 0.3880 -0.0220 0.0010 0.0015supply const. 1 -13.47 -13.4798 -0.0098 0.0042 0.0043supply const. 2 -12.12 -12.1033 0.0167 0.0093 0.0096ση1 0.75 0.7402 -0.0098 0.0025 0.0026ση2 0.65 0.6591 0.0091 0.0025 0.0026labor cost 2.03 2.0222 -0.0078 0.0015 0.0016material cost 0.27 0.2760 0.0060 0.0215 0.0216proportion of segment 1 0.50 0.5095 0.0095 0.0106 0.0107
Table 6: Monte Carlo results for different mixtures of normal and logistic distribution (pricecoefficient only, true value −0.21).
normal 0.8*normal 0.5*normal 0.2*normal logisticBias 0.00085 -0.00931 -0.01532 -0.00717 -0.00016371Variance 0.00010 0.00005 0.00013 0.00011 0.000199533MSE 0.00010 0.00014 0.00036 0.00016 0.000199559
Another important factor affecting the performance of the estimation procedure is the
choice of bandwidth (see Section 3). The bandwidth determines the smoothness of the joint
density of equilibrium prices and probabilities, i.e., the likelihood function. Too small a band-
width leads to a likelihood that is not well-behaved and hence makes finding a global maximum
very difficult. Too large a bandwidth, however, may cause the likelihood function to differ
greatly from the true underlying density of equilibrium prices and probabilities. We examined
the sensitivity of the price estimate to the choice of this parameter by looking at bandwidths
that are 1/4, 1/2, 2, and 4 times the normal reference rule bandwidth. Table 7 summarizes
the results for the price coefficient for two different sets of parameters. One set of parameters
22
was generated as above based on preliminary estimates in the peanut butter category (true
value for price is −0.21), the other set of parameters corresponds to the parameter estimates
in the laundry detergent category (true value for price −0.77). It appears that the bandwidth
obtained by the normal reference rule (equation (13)) performs well. Moreover, the precision
of the estimates is not overly sensitive to the choice of the smoothing parameter.
Table 7: Comparison of Monte Carlo results for different bandwidths (price coefficient only).NR is the bandwidth computed from the normal reference rule.
0.25×NR 0.5×NR NR 2×NR 4×NR
True Value −0.21Bias -0.01017 -0.00203 0.00354 0.00354 -0.00209Variance 0.00005 0.00002 0.00004 0.00004 0.00004MSE 0.00015 0.00002 0.00005 0.00005 0.00004True Value −0.77Bias 0.03411 0.01356 0.00094 -0.01344 -0.01977Variance 0.00105 0.00026 0.00010 0.00023 0.00086MSE 0.00222 0.00044 0.00011 0.00041 0.00125
To summarize, our Monte Carlo simulations demonstrate the ability of the proposed estima-
tion procedure to reliably recover the true parameters of an equilibrium model. In particular,
the parameter of interest, namely the price coefficient, is estimated with a very high degree of
precision. The conducted robustness checks indicate that our methodology is fairly robust to
modifications of the distributional assumptions as well as bandwidth selection.
6 Concluding Remarks
In this article we develop a new likelihood-based methodology for the estimation of structural
demand-and-supply models using disaggregate data. Marketing researchers have established a
long tradition of estimating random utility models of consumer demand using maximum likeli-
hood methods. Tying a traditional individual-level choice model such as a logit or probit with
a supply side specification is a non-trivial task. Simply assuming a joint distribution of prices
23
and probabilities is inconsistent with the equilibrium notion. Furthermore, the nonlinearity
of brand choice models makes writing down the joint distribution of equilibrium prices and
probabilities implied by the unobserved demand and supply shocks very challenging.
We solve these problems by simulating equilibrium prices and probabilities and then us-
ing the empirical likelihood of these prices and probabilities to obtain the parameters of the
model. Estimating the demand and supply equations jointly deals with the problem of price
endogeneity and ensures that we obtain reliable estimates of the price response parameter.
Moreover, the estimated structural equilibrium model can be used to perform “what-if” type
analyses (Draganska and Jain 2003).
We apply the proposed algorithm to both real-world scanner data and to simulated data in
order to assess the properties of the estimation method and highlight its merits and limitations.
Overall, the new procedure performs very well. It yields estimates of plausible magnitude when
applied to individual level choice data in several product categories. The conducted Monte
Carlo experiments demonstrate both the accuracy of our method and its robustness.
One of the attractive features of our approach relative to previous research considering
endogeneity in individual-level models (Villas-Boas and Winer 1999, Villas-Boas and Zhao
2001) is the ability to model explicitly the heterogeneity structure of the population. We specify
and estimate a latent class model to incorporate unobserved heterogeneity across households.
In its current form, however, our method cannot readily take into account the panel structure of
the household-level data. That is, if there is a correlation in the tastes of individual households,
our procedure yields a partial likelihood and the estimated standard errors need to be corrected.
Extending the proposed methodology to explicitly incorporate the dependencies in households
choices over time is an important area for future research.14
On the supply side, one might think about the reasonability of the assumed Nash behavior in
prices. Our method does not require any particular assumption about the strategic interactions
24
between firms. A conjectural variation approach or a menu approach to test for different
behavioral assumptions could be employed to reveal the nature of competition in the market.
This is critical because misspecification of the supply side translates into a misspecified system,
thus leading to inconsistent parameter estimates. Future research could also focus on enriching
the supply side by explicitly incorporating the channel structure (Villas-Boas 2001, Sudhir
2001).
In the current analysis we only consider the endogeneity of prices to illustrate the proposed
methodology. Recent studies have suggested, however, that other strategic instruments such as
advertising (Vilcassim, Kadiyali and Chintagunta 1999) and product line length (Draganska
and Jain 2003) should also be considered endogenous. One fruitful venue for future study
would therefore be to apply the estimation procedure developed in this paper to the analysis
of other marketing mix instruments.
In sum, the present research is a first step towards the estimation of a market equilibrium
model with a disaggregate discrete choice model on the demand side and an oligopoly model
on the supply side. The proposed estimation procedure explicitly accounts for the price endo-
geneity problem. It further bears the potential of combining the advantages of simultaneous
estimation of market models with recent developments in incorporating richer heterogeneity
structures and more flexible error specifications in disaggregate models.
References
Ackerberg, D. and Gowrisankaran, G. (2001). Quantifying equilibrium network externalities
in the ACH banking industry, working paper, UCLA.
Anderson, S., de Palma, A. and Thisse, J. (1992). Discrete Choice Theory of Product Differ-
entiation, MIT Press, Cambridge, MA.
25
Berry, S. (1994). Estimating discrete-choice models of product differentiation, RAND Journal
of Economics 25: 242–262.
Berry, S., Carnall, M. and Spiller, P. (1997). Airline hubs: Costs, markups and the implications
for consumer heterogeneity, working paper, Yale University.
Berry, S., Levinsohn, J. and Pakes, A. (1995). Automobile prices in market equilibrium,
Econometrica 63: 841–890.
Besanko, D., Dube, J.-P. and Gupta, S. (2003). Competitive price discrimination strategies in
a vertical channel with aggregate data, Management Science 49(9): 1121–1138.
Besanko, D., Gupta, S. and Jain, D. (1998). Logit demand estimation under competitive
pricing behavior: An equilibrium framework, Management Science 44: 1533–1547.
Burnkrant, R. and Unnava, H. R. (1995). Effeccts of self-referencing on persuasion, Journal of
Consumer Research 22: 17–26.
Chintagunta, P., Dube, J.-P. and Goh, K.-Y. (2003). Beyond the endogeneity bias: The effect
of unmeasured brand characteristics on household-level brand choice models, Technical
report, University of Chicago GSB.
Chintagunta, P., Jain, D. and Vilcassim, N. (1991). Investigating heterogeneity in brand
preferences in logit models for panel data, Journal of Marketing Research 28: 417–428.
Draganska, M. and Jain, D. (2003). Product-line length as a competitive tool, working paper,
Stanford University.
Dube, J.-P. (2003). Discussion of ‘Bayesian analysis of simultaneous demand and supply’,
Quantitative Marketing and Economics 1(3). forthcoming.
26
Fader, P. and Hardie, B. (1996). Modeling consumer choice among SKUs, Journal of Marketing
Research 33: 442–452.
Gonul, F. and Srinivasan, K. (1993). Modeling multiple sources of heterogeneity in multinomial
logit models: methodological and managerial issues, Marketing Science 12(3): 213–229.
Goolsbee, A. and Petrin, A. (2003). The consumer gains from direct broadcast satellites and
the competition with cable television, Econometrica . forthcoming.
Guadagni, P. and Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner
data, Marketing Science 2(3): 203–238.
Hardle, W. (1990). Applied Nonparametric Regression, Cambridge University Press.
Kamakura, W. and Russel, G. (1989). A probabilistic choice model for market segmentation
and elasticity structure, Journal of Marketing Research 26: 379–390.
Kennan, J. (1989). Simultaneous equation bias in disaggregated econometric models, Review
of Economic Studies 56: 151–156.
McFadden, D. (1989). A method of simulated moments for estimation of discrete response
models without numerical integration, Econometrica 57(5): 995–1026.
Nelder, J. and Mead, R. (1965). A simplex method for function minimization, Computer
Journal 7: 308–313.
Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry, Econometrica
69(2): 307–342.
Pakes, A. and Pollard, D. (1989). Simulation and the asymptotics of optimization estimators,
Econometrica 57: 1027–1057.
27
Petty, R. and Cacioppo, J. (1986). Communications and Persuasion: Central and Peripheral
Routes to Attitude Change, Springer Verlag.
Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (1993). Numerical Recipes in C: The
Art of Scientific Computing, 2 edn, Cambridge University Press.
Scott, D. (1992). Multivariate Density Estimation : Theory, Practice, and Visualization, Wiley
Series in Probability and Statistics, New York.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall,
London.
Slade, M. (1995). Product rivalry and multiple strategic weapons: An analysis of price and
advertising competition, Journal of Economics and Management Strategy 4: 445–476.
Stone, C. (1980). Optimal rates of convergence for nonparametric estimators, Annals of Statis-
tics 8: 1348–1360.
Sudhir, K. (2001). Structural analysis of competitive pricing in the presence of a strategic
retailer, Marketing Science 20(3): 244–264.
Viard, B., Polson, N. and Gron, A. (2002). Likelihood based estimation of nonlinear equilibrium
models with random coefficients, working paper, Stanford University.
Vilcassim, N., Kadiyali, V. and Chintagunta, P. (1999). Investigating dynamic multifirm
market interactions in price and advertising, Management Science 45: 499–518.
Villas-Boas, M. and Winer, R. (1999). Endogeneity in brand choice models, Management
Science 45: 1324–1338.
Villas-Boas, M. and Zhao, Y. (2001). The ketchup marketplace: Retailers, manufacturers and
individual consumers, working paper, UC Berkeley.
28
Villas-Boas, S. (2001). Vertical contracts between manufacturers and retailers: An empirical
analysis, working paper, UC Berkeley.
Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data, MIT Press.
Yang, S. and Allenby, G. (2000). A model for observation, structural, and household hetero-
geneity in panel data, Marketing Letters 11: 137–149.
Yang, S., Chen, Y. and Allenby, G. (2003). Bayesian analysis of simultaneous demand and
supply, Quantitative Marketing and Economics 1(3): 1–25. forthcoming.
Yatchew, A. (1998). Nonparametric regression techniques in economics, Journal of Economic
Literature 36(2): 669–721.
29
Notes
1The authors wish to thank Arie Beresteanu, Ulrich Doraszelski, Jean-Pierre Dube, Gau-tam Gowrisankaran, Charles Manski, Mike Mazzeo, Brian Viard and participants at the 1999Marketing Science conference in Syracuse for their helpful comments and suggestions. MariuszRabus provided expert research assistance for this project.
2An anonymous referee drew our attention to the fact that our assumption is somewhatsimilar to what Yang and Allenby (2000) call ‘observation’ heterogeneity. Yang and Allenby(2000) define this term in the context of a latent class model as a specification in which thelatent class probabilities depend on observable covariates. This contrasts with ‘household’ or’structural’ heterogeneity, which entails dependence over time.
3The main drawback of a continuous distribution of consumer heterogeneity is its compu-tational complexity, since we need to numerically evaluate multidimensional integrals. Whilethis is also true in standard models (e.g., Berry et al. (1995)), our estimation algorithm isalready computational intense, so we prefer to work with a discrete distribution.
4For a lucid discussion of this approach, see Dube (2003).
5In small samples, most kernel density estimators are biased. Our Monte Carlo results indi-cate that this does not impair the ability of our procedure to recover the structural parametersof the equilibrium model. If unbiasedness is desired, one can use so-called higher-order kernels,which are computationally more demanding.
6Another possibility to obtain a smooth likelihood function has been explored by Ackerbergand Gowrisankaran (2001). The authors make the auxiliary assumption of normal measurementerror that allows them to express the likelihood function in terms of the normal density. Asimilar assumption has also been employed by Viard, Polson and Gron (2002) who estimate anequilibrium model using Bayesian methods (Markov Chain Monte Carlo techniques). Theseapproaches may be problematic if the underlying density of the endogenous variables differssignificantly from a normal density.
7For a thorough treatment the interested reader is referred to Yatchew (1998).
8Details on the estimation procedure available from the authors upon request.
9In the laundry detergent category, we use data for 107 weeks.
10Yoplait only offers single-serving size yogurt. Dannon also carries 16oz and 32oz of plainand vanilla yogurt in addition to single-serving size. It is often argued that these two particularflavors are used for cooking purposes and constitute a different market.
11Details are available from the authors upon request.
12We are grateful to an anonymous referee for bringing this point to our attention.
30