a likelihood approach to estimating market equilibrium ... · dogeneity through simultaneous...

33
A Likelihood Approach to Estimating Market Equilibrium Models 1 Michaela Draganska Stanford University Graduate School of Business Stanford, CA 94305-5015 draganska [email protected] Dipak Jain Kellogg School of Management Northwestern University Evanston, IL 60208-2001 [email protected]

Upload: others

Post on 27-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

A Likelihood Approach to Estimating Market Equilibrium Models 1

Michaela DraganskaStanford University

Graduate School of BusinessStanford, CA 94305-5015

draganska [email protected]

Dipak JainKellogg School of Management

Northwestern UniversityEvanston, IL 60208-2001

[email protected]

Abstract

This paper develops a new likelihood-based method for the simultaneous estimation ofstructural demand-and-supply models for markets with differentiated products. We specifyan individual-level discrete choice model of demand and derive the supply side assumingmanufacturers compete in prices. The proposed estimation method considers price en-dogeneity through simultaneous estimation of demand and supply, allows for consumerheterogeneity, and incorporates a pricing rule consistent with economic theory.

The basic idea behind the proposed estimation procedure is to simulate prices andchoice probabilities by solving for the market equilibrium. By repeating this many times,we obtain an empirical distribution of equilibrium prices and probabilities. The empiricaldistribution is then smoothed and used in a likelihood procedure to estimate the parametersof the model. The advantage of this method is that it avoids the need to perform atransformation of variables. If consumers’ tastes are independent across market periods,our approach yields maximum-likelihood estimates; otherwise it yields consistent but notfully efficient partial likelihood estimates.

Key Words: price endogeneity, competitive strategy, maximum likelihood.

1 Introduction

In recent years marketers have become increasingly interested in estimating structural market

equilibrium models, where demand is derived from utility maximization on the part of con-

sumers, and the supply side is obtained by assuming that firms maximize profits given the

characteristics of the market. Because competitive environment (i.e., market structure) and

policy variables (i.e., marketing mix) are specified explicitly, we can identify separate demand,

cost, and competitive effects. Estimating a market equilibrium model enables us to analyze

questions pertaining to firms’ strategies in the marketplace through “what-if” type analyses by

taking into account all interdependencies between the demand and supply sides of the market.

The simultaneous estimation of demand and supply is also motivated by the so-called

endogeneity problem. In short, endogeneity arises because marketing variables not only affect

consumer choice, but because consumer choice also affects marketing mix decisions. It has

been well documented that ignoring endogeneity leads to biased coefficient estimates of the

marketing mix variables and therefore to suboptimal decisions (Besanko, Gupta and Jain 1998,

Villas-Boas and Winer 1999).

It is often argued that the use of individual-level data solves the endogeneity problem, since

individuals are price takers. However, even though price is exogenous in a microeconomic sense,

there still might be important correlations between the price and the error term in the demand

equation, thus leading to econometric endogeneity (Kennan 1989). Product attributes that

are unobservable to the researcher such as coupon availability, national advertising and shelf

space allocation have an impact on consumer utility as well as on price setting decisions by

firms (Villas-Boas and Winer 1999, Besanko, Dube and Gupta 2003). Prices should thus be

viewed as endogenous independent of the aggregation level of the data used in the analysis.

In this research, we focus on developing a new likelihood based method for the estimation of

structural demand-and-supply models. Our demand model falls into the broad class of discrete

1

choice models of markets for differentiated products (Anderson, de Palma and Thisse 1992).

The supply model is derived from the profit maximization behavior of the firms, assuming

Bertrand-Nash competition in prices between manufacturers. Market equilibrium is determined

jointly by the demand and supply specifications, and our estimation procedure accordingly

considers the equilibrium equations simultaneously.

Once the presence of unobserved product attributes is acknowledged, it is no longer possi-

ble to estimate a discrete choice model using traditional maximum likelihood methods because

in this case prices will be correlated with the unobservables due to the strategic price-setting

behavior of firms (Berry 1994). Therefore, choice probabilities depend on the unobserved prod-

uct attributes not only directly but also indirectly via prices. Hence, one cannot integrate the

unobserved product attributes out of the choice probabilities without taking this latter depen-

dency into account. Berry (1994) proposed a technique for the estimation of discrete choice

models using instrumental variables to account for the endogeneity of prices. His approach

is easy to implement and has been widely applied to the analysis of aggregate data (Berry,

Levinsohn and Pakes 1995, Besanko et al. 1998, Nevo 2001).

Marketing researchers, however, have long recognized the advantages of data describing

the purchase behavior of individual consumers. Such disaggregate scanner panel data provide

detailed information that can be used to learn about their preferences. For example, they enable

us to understand the source of behaviors such as variety seeking or deal proneness. Given the

richness of scanner panel data, a large literature has evolved that uses them to estimate discrete

choice models of consumer behavior (Guadagni and Little 1983, Kamakura and Russel 1989,

Chintagunta, Jain and Vilcassim 1991, Gonul and Srinivasan 1993, Fader and Hardie 1996).

These models have focused on estimating the demand side and have not considered the possible

presence of endogeneity. Recently Goolsbee and Petrin (2003) and Chintagunta, Dube and Goh

(2003) apply variants of Berry’s (1994) method to estimate consumer demand using individual-

2

level choice data. These approaches are useful when the main interest lies in obtaining precise

demand-side estimates because they provide a way to account for price endogeneity without

the need of making assumptions about supply-side behavior. If conducting policy experiments

is our goal, however, then estimating an equilibrium model is preferable, since it enables us to

take advantage of the cross-equation dependencies of the structural parameters.

An equilibrium model provides a mapping from unobserved product attributes and cost

shocks to market outcomes, i.e., prices and choice probabilities. Extending traditional MLE

methods to include a supply side in addition to a consumer choice model is not straightforward

because it requires that the researcher is able to write down the joint distribution of these

equilibrium outcomes. Assuming that this distribution is known runs counter to the notion of

an equilibrium (Berry 1994). Hence, the joint distribution of these equilibrium outcomes needs

to be derived from the distribution of the unobservables. Performing this transformation of

variables proves to be very difficult due to the highly nonlinear nature of the model. Villas-

Boas and Winer (1999) circumvent this problem by estimating a reduced-form pricing rule that

relates current prices to lagged prices. In a subsequent article, Villas-Boas and Zhao (2001)

specify a structural supply-side model derived from manufacturers’ and retailers’ optimization

problem and estimate the equilibrium model directly using maximum likelihood. This direct

approach to estimating the Jacobian, however, prevents them from incorporating consumer

heterogeneity. Recently, Yang, Chen and Allenby (2003) have proposed a Bayesian approach

to resolve the issue.

In this article, we propose a likelihood based approach to the estimation of a structural

demand-and-supply model using individual-level choice data. The basic idea behind the pro-

posed estimation procedure is to simulate prices and probabilities by randomly drawing the

shocks from an assumed joint distribution, and then solve for the equilibrium. By repeating

this many times, we obtain an empirical distribution of equilibrium prices and probabilities.

3

The empirical distribution is then smoothed and used in a maximum-likelihood procedure to

estimate the parameters of the model. The advantage of this method is that it avoids the need

to perform a transformation of variables and thus enables us to estimate the model when the

evaluation of the Jacobian seems infeasible.

In computing the likelihood of the data, we treat market periods as independent from each

other. This implicitly assumes that there is no persistence in the preferences of consumers

across market periods.2 If markets are geographical regions rather than time periods, then

this assumption is warranted. Furthermore, the psychology literature suggests that consumers’

preferences change over time, often depending on contextual effects that are unobserved by the

econometrician (Petty and Cacioppo 1986, Burnkrant and Unnava 1995). To the extent that

this leads to independence over time, our procedure yields maximum-likelihood estimates of

the model parameters. If, on the other hand, there is a correlation in consumers’ preferences

over time, then our procedure yields a so-called partial likelihood (Wooldridge 2002), and the

resulting estimates are consistent but not fully efficient.

The remainder of the paper is organized as follows. Section 2 develops the equilibrium

model. In Section 3, the estimation procedure is described along with the details of the im-

plementation. In Section 4, we apply the estimation method to two frequently purchased

consumer products, yogurt and laundry detergent. We demonstrate the accuracy of the pro-

posed procedure in a Monte Carlo study presented in Section 5. In Section 6 we conclude with

a summary and directions for future research.

2 Model Formulation

2.1 Demand Specification

Brands are indexed by j = 0, . . . , J , and market periods by t = 1, . . . , T . Let household

types be indexed by n = 1, . . . , N , where a type denotes a set of households with identical

4

demographic characteristics. There are mn individuals of type n. To capture unobserved

consumer heterogeneity, we use a latent class approach and specify random coefficients with

an L-point distribution (Kamakura and Russel 1989).3 This specification is appealing in terms

of interpretability for marketing purposes and has been applied both in the economics and

marketing literature (Berry, Carnall and Spiller 1997, Besanko et al. 2003). Let the latent

market segments be indexed by l = 1, . . . , L. The share of segment l in the population is

λl ≥ 0, where∑L

l=1 λl = 1.

Consumer behavior is governed by the following utility function:

un0t = εn0t,

unjt = xnjtβl − αlpjt + ξjt + εnjt,

where {εn0t, . . . , εnJt} are iid extreme value distributed, xnjt are observed characteristics of an

alternative or decision-maker, and pjt denotes the price of alternative j in period t. βl and αl are

the respective response parameters. We allow for household-specific variation in these response

parameters to capture consumer heterogeneity. The demand shocks {ξ1t, . . . , ξJt} are common

across consumers and represent product characteristics that are unobserved by the researcher,

but are taken into account by the firms in their pricing decision. While some unobserved

product characteristics, such as quality and brand image, can be captured through the inclusion

of brand-specific constants, ξjt reflects time-varying factors like coupon availability, shelf space,

and national advertising.

Brand 0 is the outside good (i.e., no-purchase alternative). Including an outside good

allows for category expansion effects of marketing actions. We assume that the outside good

is non-strategic, i.e., its price is not set as a best response to the inside goods.

Utility maximization and the assumptions on the error term imply that the probability of

5

household n purchasing brand j in market period t, Dnjt, is given by

Dnjt =L∑

l=1

λlDnljt =L∑

l=1

λlexp(xnjtβl − αlpjt + ξjt)

1 +∑J

k=1 exp(xnktβl − αlpkt + ξkt)(1)

and the probability of the outside good being chosen is

Dn0t =L∑

l=1

λlDnl0t =L∑

l=1

λl1

1 +∑J

k=1 exp(xnktβl − αlpkt + ξkt). (2)

2.2 Supply Specification

The supply side is characterized by Bertrand-Nash behavior on part of oligopolistic firms (Berry

et al. 1995, Besanko et al. 1998). We assume that retailers pass through the manufacturers’

decisions, which is likely to hold for categories that do not have strategic impact on store traffic

or are a primary driver of retailers’ profits. Under this assumption we do not need to explicitly

include a retailer in the supply-side model.

The production function has constant returns to scale. Marginal costs for firm j in period

t are denoted by cjt. In market period t firm j maximizes profits,

maxpjt

Πjt = (pjt − cjt)N∑

n=1

mnDnjt, (3)

where∑N

n=1 mnDnjt is the expected demand for product j in period t. Expected demand is

thus given by the weighted sum of the choice probabilities for all consumer types in the market.

The first order condition for this problem is given by

N∑

n=1

mn∂Dnjt

∂pjt(pjt − cjt) +

N∑

n=1

mnDnjt = 0. (4)

Given our demand model, the above equation can be rewritten as

pjt = cjt +∑N

n=1 mn∑L

l=1 λlDnljt∑Nn=1 mn

∑Ll=1 λlαlDnljt(1−Dnljt)

. (5)

We infer marginal cost from the data using the relationship

cjt = wtγj + ηjt, (6)

6

where wt are observable variables, e.g. input prices, and ηjt denotes cost characteristics that

are unobserved by the researcher. Substituting (6) in (5) yields

pjt = wtγj +∑N

n=1 mn∑L

l=1 λlDnljt∑Nn=1 mn

∑Ll=1 λlαlDnljt(1−Dnljt)

+ ηjt, j = 1, . . . , J. (7)

2.3 Market Equilibrium

Considering the demand equations (1) and supply equations (7) jointly, the market equilibrium

is defined by

Dnjt =L∑

l=1

λlexp(xnjtβl − αlpjt + ξjt)

1 +∑J

k=1 exp(xnktβl − αlpkt + ξkt), j = 1, . . . , J, n = 1, . . . , N, (8)

pjt = wtγj +∑N

n=1 mn∑L

l=1 λlDnljt∑Nn=1 mn

∑Ll=1 λlαlDnljt(1−Dnljt)

+ ηjt, j = 1, . . . , J. (9)

In equilibrium, prices and probabilities depend on both {ξjt}j and {ηjt}j . Hence, estimat-

ing the equations separately leads to a simultaneity bias: both in equation (8) and (9), the

explanatory variables are correlated with the unobserved errors. Consider equation (9) and

suppose that firm j faces a high cost shock ηjt in period t. This will lead the firm to charge

a higher price pjt which in turn decreases the probability that it will be chosen by consumers

of type n, that is, Dnjt decreases. Consequently, the regressorPN

n=1 mnPL

l=1 λlDnljtPNn=1 mn

PLl=1 λlαlDnljt(1−Dnljt)

changes with the error ηjt, and this correlation leads to biased estimates for α. A joint esti-

mation of (8) and (9) would account for such possible correlation and hereby lead to a valid

estimate of the price coefficient αl.

3 Estimation Procedure

In this section we develop a maximum likelihood-based procedure to obtain estimates of the

structural parameters. The model can be written in the general form of a response function,

where the endogenous variables are expressed as a function of the exogenous variables. That

7

is, for each market period t, if equilibrium is unique, we have

[{Dnjt}n,j , {pjt}j ] = f [{xnjt}n,j , {wjt}j , {ξjt}j , {ηjt}j , θ], (10)

where θ = ({αl}l, {βl}l, {λl}l, {γj}j , {σξj}j , {σηj}j) is the vector of parameters to be estimated.

As noted previously, the likelihood function of such an equilibrium model is in general

intractable. Consider the equilibrium model as defined by equations (8) and (9), where we set

N = 1 to simplify exposition. For given values of the exogenous variables, the joint distribution

of the demand and supply shocks {ξjt}j and {ηjt}j induces a distribution of the equilibrium

prices {pjt}j and probabilities {Djt}j . The difficulty in writing down the likelihood function

stems from the fact that this induced distribution of prices and probabilities is hard to obtain

directly through a transformation of variables approach. To compute the transformation of the

demand and supply shocks, one would need to solve the system of equilibrium equations for

{ξjt}j and {ηjt}j and then derive the Jacobian of this inverse transformation. That is, there

needs to exist a set of J functions {uj(·)}j that map prices and probabilities into the ξj ’s and

another set of J functions {vj(·)}j that map prices and probabilities into ηj ’s. Let h be the

pdf of ({ξjt}j , {ηjt}j). To obtain the pdf g of ({pjt}j , {Djt}j), the transformation of variables

is

g({pjt}j , {Djt}j) = h(u1({pjt}j , {Djt}j), . . . , uJ({pjt}j , {Djt}j),

v1({pjt}j , {Djt}j), . . . , vJ({pjt}j , {Djt}j)) · |Jac|,

where Jac is the (2J × 2J) Jacobian of the inverse transformation

∂u1∂p1t

. . . ∂u1∂pJt

∂u1∂D1t

. . . ∂u1∂DJt

...... . . .

......

...∂uJ∂p1t

. . . ∂uJ∂pJt

∂uJ∂D1t

. . . ∂uJ∂DJt

∂v1∂p1t

. . . ∂v1∂pJt

∂v1∂D1t

. . . ∂vJ∂DJt

...... . . .

......

...∂vJ∂p1t

. . . ∂vJ∂pJt

∂vJ∂D1t

. . . ∂vJ∂DJt

.

8

The problem is that the equilibrium equations generally cannot be solved to obtain the in-

verse transformations {uj(·)}j and {vj(·)}j . Moreover, even if we could obtain {uj(·)}j and

{vj(·)}j , e.g., using numerical methods, then we would still have to compute the Jacobian

of this (unknown!) inverse transformation. Due to the highly nonlinear model specification,

this is a daunting task. In a recent article, Yang et al. (2003) propose a Bayesian approach

to estimating market equilibrium models. While the authors simplify the transformation of

variables considerably by transforming the supply shocks into prices conditional on demand

shocks, the computation of the Jacobian still needs to be done using numerical methods.4

Our approach is different. We avoid performing the transformation of variables altogether

and instead obtain equilibrium prices and probabilities using simulation. Recall that, for given

values of the exogenous variables, the joint distribution of demand and supply shocks induces

a distribution of prices and probabilities. We exploit this by numerically solving the model

repeatedly for simulated demand and supply shocks. For each draw of the demand and supply

shocks, we obtain the corresponding equilibrium prices and probabilities. Then we compute

the joint distribution of prices and probabilities.

There is, however, no guarantee that the empirical distribution of prices and probabilities

obtained through the simulation will be smooth, which is a property we need for the optimiza-

tion. We therefore employ nonparametric techniques to estimate the joint density of prices and

probabilities, and then evaluate it at the actual data to obtain a smooth, well-behaved likeli-

hood function. The parameter estimates are obtained by maximizing this likelihood function

using an iterative optimization procedure.

Estimation Algorithm

Based on the previous discussion, there are three main components to the proposed estimation

procedure:

(i) simulation of equilibrium prices and probabilities,

9

(ii) estimation of the joint density of prices and probabilities in order to smooth the likeli-

hood function, and

(iii) maximization of the loglikelihood to obtain the parameters of the model.

The estimation algorithm proceeds as follows. Let s = 1, . . . , S index the simulations per

time period t.

Step 1: Draw {ξjt}j,t, and {ηjt}j,t S times.

Step 2: Choose a starting value for θ.

Step 3: Set s = 1.

Step 4: Set t = 1.

Step 5: Using the sth draw solve

pjt = wtγj

+

∑Nn=1 mn

∑l λl

exp(xnjtβl−αlpjt+ξjt)

1+PK

k=1 exp(xnktβl−αlpkt+ξkt)∑Nn=1 mn

∑l λlαl

exp(xnjtβl−αlpjt+ξjt)

1+PK

k=1 exp(xnktβl−αlpkt+ξkt)

{1− exp(xnjtβl−αlpjt+ξjt)

1+PK

k=1 exp(xnktβl−αlpkt+ξkt)

}

+ ηjt (11)

to obtain {pjt}j . Using {pjt}j calculate Dnjt =∑

l λlexp(xnjtβl−αlpjt+ξjt)

1+PJ

k=1 exp(xnktβl−αlpkt+ξkt).

Step 6: Increase s by 1. If s ≤ S, go back to step 5.

Step 7: Estimate the joint density of the calculated prices and probabilities, ϕ({pjt}j , {Dnjt}n,j), at theactual prices and probabilities to get the period t contribution to the loglikelihood.

Step 8: Increase t by 1. If t ≤ T , then go back to step 5.

Step 9: Update θ to maximize the loglikelihood. If convergence is reached, terminate. Else go back to

step 3.

We now discuss the details of the estimation algorithm step by step.

3.1 Simulation of Equilibrium Prices and Probabilities

The first component of the estimation procedure is the simulation of equilibrium prices and

probabilities. We assume that demand and supply shocks are normally distributed,

(ξ1t, . . . , ξJt, η1t, . . . , ηJt) ∼ N(0, diag(σ2ξ1 , . . . , σ

2ξJ

, σ2η1

, . . . , σ2ηJ

)).

10

Further, we assume that {ξjt}j and {ηjt}j are independent across time and independent of

{εnjt}n,j . Since all error terms are independent across time and all maximization problems are

static, we can treat each period separately.

In step 1, we draw the errors only once and use them for all values of θ. McFadden (1989)

shows in the context of method of moments estimation that using the same set of random

draws to simulate the model at different trial parameter values helps to avoid “chattering” of

the simulator, i.e., it ensures that the criterion function would not be discontinuous. Pakes and

Pollard (1989) also note that the properties of simulation estimators, and the performance of

the algorithms used to determine them, require the use of simulation draws that do not change

as the optimization algorithm varies θ. We implement this step by drawing from a standard

normal distribution and then multiplying the draws by the standard deviations, σξ1 , . . . , σξJ,

and ση1 , . . . , σηJ .

Given the exogenous variables, a set of parameter values and random shocks, we solve for

the equilibrium prices and probabilities in step 5. Note that equation (11) is differentiable in

{pjt}j , so we use a Newton-Raphson gradient procedure. In accordance with theory, we obtain

a unique solution along with reasonable values for the equilibrium prices.

For each period t, this step is repeated S times to generate an empirical distribution of

equilibrium prices and probabilities for this period. The number of random draws (simulations)

for the computation of the equilibrium points is to be determined by the user. The minimum

number of simulations, however, is set by the requirements of the density estimation procedure

as discussed below.

3.2 Estimation of the Joint Density of Prices and Probabilities

In principle, we could use the empirical distribution of the simulated equilibrium prices and

probabilities to directly compute the likelihood of the data. However, since the empirical

distribution is not smooth, this will in general not lead to a well-behaved likelihood function

11

that is readily optimized. One possibility is to use kernel density estimation to smooth the

simulated data points. In a nutshell, kernel smoothing involves weighted local averaging with

the kernels as weights.

Kernel estimators have two desirable properties: First, they are consistent5; second, by

averaging over a neighborhood that shrinks at an appropriate rate, they achieve the optimal

rate of convergence for nonparametric estimators (Stone 1980). There are other nonparametric

techniques that could be considered for smoothing purposes such as splines and orthogonal

(Fourier) series (Hardle 1990). While these estimators are similar in terms of computational

intensity and are asymptotically equivalent, they may differ in their small sample properties.

However, to the best of our knowledge, there is no comprehensive Monte-Carlo study or any

analytical result that show better small sample properties for any of the these estimators.

Therefore, the choice among them is largely a matter of taste.6

We employ a multivariate kernel density estimator with a multiplicative Gaussian kernel

(Hardle 1990) to evaluate the joint density of the simulated equilibrium prices and probabilities

at the actual data to obtain the contribution to the likelihood (step 7). Unlike prices, the

probabilities are not directly observed but they are nonparametrically estimable from the

data. Formally, the estimated joint density of the calculated prices and probabilities at the

actual data is given by:

ϕ({pjt}j , {Dnjt}n,j) =1S

S∑

s=1

J∏

j=1

1hp

j

K

(ps

jt − pjt

hpj

)J∏

j=1

N∏

n=1

1hD

nj

K

(Ds

njt −Dnjt

hDnj

), (12)

where s indexes simulations, K(·) is the Gaussian kernel defined as K(u) = 1√2π

exp(−u2

2

),

and hpj and hD

nj are smoothing parameters defined below.

One well-known problem of nonparametric density estimation is the so called ‘curse of

dimensionality,’ i.e., the explosion of number of data points needed for estimation of higher-

dimensional densities. The minimum number of data points required for the estimation has

been tabulated in Silverman (1986, Table 4.2). Because we simulate the equilibrium prices and

12

probabilities, this is only a computational issue in the present case: we can simulate as many

data points as needed by drawing more errors. For example, if there are two competing brands

in the market, that is, if we want to estimate a four-dimensional density (two prices and two

choice probabilities), we need 223 data points. For three brands, the required sample size is

2790, and for four brands 43700 observations are needed.

Nonparametric density estimation requires the choice of a smoothing parameter, bandwidth,

that governs the degree of smoothness of the density estimate. In essence, the bandwidth

determines how much averaging we want to do around a given point. Naturally, the larger the

bandwidth, the smoother the function. However, with increasing bandwidth, the estimated

density may be far from the true underlying density. If the bandwidth is chosen too small,

however, the obtained density would have a ‘rough’ surface and will not be as easy to use for

optimization. Therefore, the choice of bandwidth is critical.7

We are only interested in the density at the point of the actual data, so we have to choose

the bandwidth to be locally optimal. Intuitively, if we choose the bandwidth too large, the

estimated density is essentially constant throughout the parameter space. Hence, every point is

a global maximum (up to computer precision). On the other hand, if we choose the bandwidth

too small, the estimated density is roughly zero outside a small neighborhood of the observed

values, and every point is a local maximum. Searching for the global maximum is a tantamount

to searching for the small neighborhood of the parameter space that is associated with positive

density.

Recall that we estimate a J(N + 1)-dimensional density. We select the bandwidth for the

ith dimension according to the normal reference rule (Scott 1992),

hi =(

4J(N + 1) + 2

)1/(J(N+1)+4)

σiS1/(J(N+1)+4), (13)

where σi is the standard deviation of the equilibrium prices (probabilities) in the ith dimension.

This rule is known to oversmooth when the underlying density is multimodal. In our case

13

this is welcome, because we want to ensure a well-behaved likelihood function that is easy

to maximize. The drawback is, of course, that the estimated density may be far from the

underlying density. However, this problem can be alleviated by successively decreasing the

bandwidth once we are in the neighborhood of the global maximum.

3.3 Maximization of the Loglikelihood

The maximum likelihood procedure builds an outer loop around the simulation of the equilib-

rium prices and probabilities and the estimation of their joint density. Thus, for each set of

parameter values θ we perform S simulations to obtain the loglikelihood function, which is in

turn maximized to obtain an updated set of parameter values.

Starting values for the parameters in step 2 may come from a preliminary estimation using

3SLS with aggregate data as in Besanko et al. (1998):

lnSjt − ln S0t = xjtβ − αpjt + ξjt, j = 1, . . . , J, (14)

pjt = wtγj +1

α(1− Sjt)+ ηjt, j = 1, . . . , J, (15)

where Sjt is the share of alternative j for week t in the aggregate data. For the standard

deviations, (σξ1 , . . . , σξJ, ση1 , . . . , σηJ ), we set the starting values equal to the root mean squared

error (RMSE) from the 3SLS procedure. Note that in equations (14) and (15) the ξjt and

ηjt enter as linear disturbances. Therefore, the RMSE provides an estimate of the standard

deviation of the errors.

The maximization of the loglikelihood is accomplished by means of a simplex search (Nelder

and Mead 1965). It is clearly infeasible to compute analytic gradients. While it is in principle

possible to use numerical gradients as part of a gradient-based optimization procedure, we

found that a simplex search that uses only the values of the loglikelihood function performs

best.8

14

4 Empirical Analyses

The algorithm described in Section 3 is applied to estimate an equilibrium model of demand

and supply in two product categories, yogurt and laundry detergent. While we account for un-

observed consumer heterogeneity, we abstract from observed heterogeneity, that is, we assume

N = 1.

Data. We use data on individual purchase histories for a panel of households in Sioux Falls,

South Dakota, collected by A.C. Nielsen. The data set spans a period of 114 weeks in 1986-

1988.9 There is information about the dates of the shopping trips of 615 households, who

purchased in the category more than twice, price paid, and item purchased (UPC). We aggre-

gate over all UPCs that belong to a brand. That is, in the yogurt category we aggregate across

different flavors, and in the laundry detergent category, we aggregate across different sizes.

To obtain weekly no-purchase probabilities, we tried out two alternative approaches. The

first was to condition on store visits, i.e., to compute the probability that a household goes

to the store but does not purchase in the category of interest (for details see Besanko et al.

(1998) and Draganska and Jain (2003)). The second approach is to assume that each household

makes a weekly decision whether to purchase in the category or not, i.e., we do not condition

on store visit when computing the no-purchase probability. The empirical results did not differ

qualitatively for the two approaches, so we decided to not condition on store visits.

For the cost shifters in the supply equation we obtained monthly data on labor and materials

prices from the Bureau of Labor Statistics (BLS). Labor costs are represented by average hourly

earnings of production workers for the respective industry (SIC 202, dairy products, for yogurt

and SIC 2841, soap and other detergents, for laundry detergent). We also use data on the

prices of the main ingredient for each of the product categories. Specifically, we obtained data

on the producer price indices for fluid milk (yogurt) and basic inorganic chemicals (laundry

15

Table 1: Descriptive Statistics of Data.Items Avg. Choice Prob. Avg. Price Material cost Labor costYogurtYoplait 0.0654 9.9425 (cent per oz.) 103.3066 9.5512Dannon 0.0377 8.0693 (cent per oz.)No purchase 0.8969

DetergentWisk 0.0192 0.0481 ($ per oz.) 96.0997 14.1633Tide 0.0484 0.0512 ($ per oz.)No purchase 0.9325

detergent). The monthly data series were then smoothed to obtain weekly cost data following

the approach suggested by Slade (1995).

Table 1 presents descriptive statistics for both product categories along with summary

information on the cost data we use for the analysis. In the yogurt category, we focus our

attention on the two major competitors in the single-serving yogurt market, Dannon and

Yoplait (General Mills).10 Yoplait is the market leader, with a market share almost double

that of Dannon and a somewhat higher price.

In the laundry detergent category, we study the competition between the two leading brands

Wisk (Unilever’s flagship brand) and Tide (Procter& Gamble’s premier brand). Tide seems

to dominate Wisk, it has both a higher market share and commands a price premium.

Estimation. We implemented the estimation algorithm in C++ using optimization routines

and routines for solving systems of nonlinear equations from Press, Teukolsky, Vetterling and

Flannery (1993).11 The parameter estimates are obtained as follows. We estimate an aggregate

model as in Besanko et al. (1998) to get starting values. These initial values are perturbed

and used in the optimization program. The output parameters are compared based on the

values of the loglikelihood function. The parameters with the largest loglikelihood are then

perturbed again and taken as input to the optimization procedure. The output parameters are

16

again compared based on the loglikelihood value. The final parameters are chosen to be those

with the largest loglikelihood value. By using different starting values, we make sure that the

optimization algorithm achieves the global maximum.

To compute the standard errors we employ bootstrap. To this end, we simulate 30 data

sets by randomly drawing with replacement from the original data. Note that unobserved

heterogeneity introduces dependencies across the purchases of a given household. To account

for this, in computing the standard errors, it is important to sample entire household histories.12

For each data set, we then run the optimization program. Finally, we compute the standard

errors as the standard deviation of the 30 sets of parameter estimates.

We use 1000 simulation draws to ensure greater precision of the estimates (recall that

for a four-dimensional density we only need 223 data points (Silverman 1986)). Convergence

is reached with up to 300 function calls for the model without heterogeneity and about 600

function calls for the model with heterogeneity. An evaluation of the likelihood function takes

between 2 and 10 seconds depending on the model specification on a Pentium 4 PC with 1GHz

clock speed and 512MB RAM.

Yogurt category. Table 2 presents the results of the empirical analysis. In addition to the

homogenous logit, which is our baseline model, we estimated two heterogeneity specifications:

one where only the price response of the two segments is different (heterogeneity 1), and one

where we also allow for heterogeneity in the brand constants (heterogeneity 2). All estimated

coefficients have face validity. The price coefficients are negative and both the wage rate and

the price of milk have a positive impact on price as expected.

There does not appear to be much qualitative difference in the estimates for the standard

logit and the heterogeneous logit specification, where only price response is allowed to vary

by segment. However, the estimated parameter for the proportion of segment 1, λ = 12%, is

significant. Once heterogeneity in the brand constants is introduced (heterogeneity 2), how-

17

ever, the difference in the estimated parameters relative to the homogenous logit specification

becomes much more pronounced. There is now a sizeable difference in the price sensitivity

between the two segments, with the slightly larger segment (57%) being the less price sensitive

one. AIC and BIC both show considerable improvement as we go from heterogeneity 1 to

heterogeneity 2. The estimated marginal costs are positive, and of reasonable magnitude: 8.91

cents per ounce for Yoplait and 7.08 cents per ounce for Dannon.13

Table 2: Parameter estimates and standard errors for yogurt data.

No Heterogeneity Heterogeneity 1 Heterogeneity 2Variable Coefficient (Std. dev.) Coefficient (Std. dev.) Coefficient (Std. Dev.)Demand Side:Dannon const. (segm. 1) 5.7064 -0.0428 5.8374 -0.0651 0.9759 0.0471Yoplait const. (segm. 1) 8.2857 -0.0278 8.3503 -0.0694 4.1928 0.0552Dannon const. (segm. 2) 10.3163 0.0376Yoplait const. (segm. 2) 6.8511 0.0765σξ1 0.3198 -0.0344 0.3584 -0.0337 0.3047 0.0336σξ2 0.3093 -0.0411 0.4131 -0.0602 0.352 0.03price (segment 1) -1.112 -0.0047 -0.8999 -0.0382 -0.6319 0.0138price (segment 2) -1.2052 -0.0214 -1.6232 0.018proportion of segment 1 0.1239 -0.0253 0.5672 0.053Supply Side:Dannon constant -9.8327 -0.0302 -10.2128 -0.081 -10.3732 0.0542Yoplait constant -7.9526 -0.045 -8.3507 -0.0907 -9.4799 0.0679ση1 0.2827 -0.0311 0.2765 -0.0311 0.2667 0.0268ση2 0.5649 -0.0434 0.5347 -0.0424 0.5178 0.0479labor cost 1.1134 -0.0134 0.9554 -0.0287 0.9725 0.0133material cost 0.615 -0.011 0.7859 -0.0228 0.8151 0.0131Loglikelihood 481.34 491.45 516.38AIC -940.68 -956.91 -1002.76BIC -940.05 -956.17 -1001.91

Laundry Detergent. Table 3 presents the estimation results. With the exception of the

material costs parameter in the standard logit specification (negative but insignificant), all

coefficient estimates have the expected signs. The price coefficients are negative in both spec-

ifications. In terms of price sensitivity, there appear to be two equally sized segments (we

estimate a proportion of 53.5% for segment 1). Labor cost has the expected positive impact

18

on the price of the product in the standard logit model but is not significantly different from

zero in the heterogenous logit specification. The brand-specific constant for Wisk is negative,

while the one for Tide is positive, reflecting the strong inherent preference for Tide. Wisk has

lower marginal cost (3.46 cents per ounce) than Tide (3.75 cents per ounce).

Table 3: Parameter estimates and standard errors for laundry detergent data.

No Heterogeneity With HeterogeneityVariable Coefficient (Std. dev.) Coefficient (Std. Dev.)Demand Side:Wisk brand constant −0.3217 (0.0760) −0.3442 (0.0366)Tide brand constant 0.9444 (0.0620) 0.9222 (0.0398)σξ1 0.4272 (0.0521) 0.4074 (0.0494)σξ2 0.4334 (0.1401) 0.4627 (0.1142)price (segment 1) −0.7710 (0.0176) −0.8740 (0.0373)price (segment 2) −0.6791 (0.0289)proportion of segment 1 0.5358 (0.0749)Supply Side:Wisk cost constant 2.6628 (0.0543) 2.6647 (0.0522)Tide cost constant 2.9299 (0.0589) 2.9036 (0.0646)ση1 0.3734 (0.0413) 0.4102 (0.0289)ση2 0.3418 (0.0301) 0.3203 (0.0152)labor cost 0.1169 (0.0506) 0.0493 (0.0396)material cost −0.0889 (0.0734) 0.0065 (0.0591)Loglikelihood −562.9528 −560.4829

We have now seen that the estimation procedure yields reasonable parameter estimates for

both the yogurt and the laundry detergent category. It still remains unclear, however, whether

our estimator performs well in general. To study its properties, in the next section we turn to

a more thorough investigation through a Monte Carlo experiment.

5 Simulation Analysis

Given the complexity of the proposed algorithm, it is very difficult to determine analytically

its properties. We therefore conducted a Monte Carlo experiment: we generated 50 artificial

19

data sets and applied the estimation procedure to each of them. Since the true underlying

parameters are known, we can compare our estimates to them and draw conclusions about the

performance of our procedure.

Data. We simulated choice data for 114 weeks and 473 households. The assumed ‘true’

parameter values roughly correspond to the ones obtained from a preliminary estimation using

scanner panel data and are listed in Table 4. There are two competing brands and an outside

good in the market with average shares of 2%, 4%, and 94%, respectively. The way the model

is set up, choosing the outside good at time t means not buying at all in week t. That is, for

each household, we have 114 observations. The total number of observations is thus 53,922.

For the supply side, we use factor price data for labor (average hourly wages of production

workers for SIC 209, miscellaneous food and kindred products) and for the key ingredient

in the production process, peanuts. We draw the demand and supply shocks from a normal

distribution. For the standard deviations, (σξ1 , . . . , σξJ, ση1 , . . . , σηJ ), we set the true values

equal to the RMSE from a preliminary 3SLS estimation. The choice and price generation

process is as specified in equations (8) and (9).

Monte Carlo results. We obtained the parameter estimates for each of the 50 Monte Carlo

samples using the algorithm described in Section 3. Table 4 presents the resulting mean,

bias, variance, and mean square error (MSE). The MSE is given by the sum of the squared

bias and the variance. In general, the proposed estimation procedure seems to work quite

well. Specifically, the variances of the parameter estimates are very small, as expected for a

maximum-likelihood based procedure. The magnitude of the biases is large compared to the

variances. It is, however, reassuring that the coefficient of interest, namely the price coefficient,

is estimated with a very high degree of reliability. The bias is only 0.00354, which is tiny

relative to the value of the price coefficient (−0.21) and suggests that our way of dealing with

20

the endogeneity problem is indeed effective. The supply-side parameters (labor and ingredients

cost) also show only a small bias.

Table 4: Monte Carlo results for proposed algorithm.

Variable True Value Mean Bias Variance MSEdemand const. 1 -2.62 -2.49396 -0.12604 0.00809 0.02398demand const. 2 -1.27 -1.14716 -0.12284 0.00554 0.02063price -0.21 -0.21354 0.00354 0.00004 0.00005supply const. 1 -13.48 -13.18638 -0.28362 0.01696 0.09740supply const. 2 -12.12 -11.86734 -0.25266 0.01645 0.08028labor cost 2.03 2.03272 -0.00272 0.00013 0.00014material cost 0.27 0.29996 -0.02996 0.00803 0.00893σξ1 0.61 0.49563 0.11437 0.00260 0.01568σξ2 0.41 0.34294 0.06706 0.00075 0.00525ση1 0.75 0.60453 0.14547 0.00388 0.02504ση2 0.65 0.48901 0.16099 0.05074 0.07666

The performance of the estimator is also excellent when unobserved heterogeneity is con-

sidered. We simulated a data set with two equally sized segments differing in their price sen-

sitivity. Table 5 presents the results of the Monte Carlo experiment for this specification. As

can be seen from the table, the price coefficients are estimated reliably. Overall, the proposed

estimation procedure handles unobserved heterogeneity very well.

Robustness checks. One key assumption we make is that the demand and supply errors are

jointly normally distributed. This does not need to be true in reality, so we test the robustness

of our procedure to different distributional assumptions. Specifically, we assume that ξ and

η are following a mixture of normal and logistic distributions. We give increasingly higher

weights to the logistic distribution to study the effects on the performance of our procedure.

As Table 6 reveals, the results are fairly robust. As expected, when the normality assumption

is satisfied, the MSE is the smallest but even when we draw the errors entirely from a logistic

distribution, the accuracy remains very high.

21

Table 5: Monte Carlo results for model with heterogeneity.

Variable True Value Mean Bias Variance MSEdemand const. 1 -2.62 -2.5948 0.0252 0.0064 0.0070demand const. 2 -1.27 -1.2540 0.0160 0.0077 0.0080price1 -0.15 -0.1517 -0.0017 0.0001 0.0001price2 -0.25 -0.2511 -0.0011 0.0014 0.0014σξ1 0.61 0.5737 -0.0363 0.0033 0.0046σξ2 0.41 0.3880 -0.0220 0.0010 0.0015supply const. 1 -13.47 -13.4798 -0.0098 0.0042 0.0043supply const. 2 -12.12 -12.1033 0.0167 0.0093 0.0096ση1 0.75 0.7402 -0.0098 0.0025 0.0026ση2 0.65 0.6591 0.0091 0.0025 0.0026labor cost 2.03 2.0222 -0.0078 0.0015 0.0016material cost 0.27 0.2760 0.0060 0.0215 0.0216proportion of segment 1 0.50 0.5095 0.0095 0.0106 0.0107

Table 6: Monte Carlo results for different mixtures of normal and logistic distribution (pricecoefficient only, true value −0.21).

normal 0.8*normal 0.5*normal 0.2*normal logisticBias 0.00085 -0.00931 -0.01532 -0.00717 -0.00016371Variance 0.00010 0.00005 0.00013 0.00011 0.000199533MSE 0.00010 0.00014 0.00036 0.00016 0.000199559

Another important factor affecting the performance of the estimation procedure is the

choice of bandwidth (see Section 3). The bandwidth determines the smoothness of the joint

density of equilibrium prices and probabilities, i.e., the likelihood function. Too small a band-

width leads to a likelihood that is not well-behaved and hence makes finding a global maximum

very difficult. Too large a bandwidth, however, may cause the likelihood function to differ

greatly from the true underlying density of equilibrium prices and probabilities. We examined

the sensitivity of the price estimate to the choice of this parameter by looking at bandwidths

that are 1/4, 1/2, 2, and 4 times the normal reference rule bandwidth. Table 7 summarizes

the results for the price coefficient for two different sets of parameters. One set of parameters

22

was generated as above based on preliminary estimates in the peanut butter category (true

value for price is −0.21), the other set of parameters corresponds to the parameter estimates

in the laundry detergent category (true value for price −0.77). It appears that the bandwidth

obtained by the normal reference rule (equation (13)) performs well. Moreover, the precision

of the estimates is not overly sensitive to the choice of the smoothing parameter.

Table 7: Comparison of Monte Carlo results for different bandwidths (price coefficient only).NR is the bandwidth computed from the normal reference rule.

0.25×NR 0.5×NR NR 2×NR 4×NR

True Value −0.21Bias -0.01017 -0.00203 0.00354 0.00354 -0.00209Variance 0.00005 0.00002 0.00004 0.00004 0.00004MSE 0.00015 0.00002 0.00005 0.00005 0.00004True Value −0.77Bias 0.03411 0.01356 0.00094 -0.01344 -0.01977Variance 0.00105 0.00026 0.00010 0.00023 0.00086MSE 0.00222 0.00044 0.00011 0.00041 0.00125

To summarize, our Monte Carlo simulations demonstrate the ability of the proposed estima-

tion procedure to reliably recover the true parameters of an equilibrium model. In particular,

the parameter of interest, namely the price coefficient, is estimated with a very high degree of

precision. The conducted robustness checks indicate that our methodology is fairly robust to

modifications of the distributional assumptions as well as bandwidth selection.

6 Concluding Remarks

In this article we develop a new likelihood-based methodology for the estimation of structural

demand-and-supply models using disaggregate data. Marketing researchers have established a

long tradition of estimating random utility models of consumer demand using maximum likeli-

hood methods. Tying a traditional individual-level choice model such as a logit or probit with

a supply side specification is a non-trivial task. Simply assuming a joint distribution of prices

23

and probabilities is inconsistent with the equilibrium notion. Furthermore, the nonlinearity

of brand choice models makes writing down the joint distribution of equilibrium prices and

probabilities implied by the unobserved demand and supply shocks very challenging.

We solve these problems by simulating equilibrium prices and probabilities and then us-

ing the empirical likelihood of these prices and probabilities to obtain the parameters of the

model. Estimating the demand and supply equations jointly deals with the problem of price

endogeneity and ensures that we obtain reliable estimates of the price response parameter.

Moreover, the estimated structural equilibrium model can be used to perform “what-if” type

analyses (Draganska and Jain 2003).

We apply the proposed algorithm to both real-world scanner data and to simulated data in

order to assess the properties of the estimation method and highlight its merits and limitations.

Overall, the new procedure performs very well. It yields estimates of plausible magnitude when

applied to individual level choice data in several product categories. The conducted Monte

Carlo experiments demonstrate both the accuracy of our method and its robustness.

One of the attractive features of our approach relative to previous research considering

endogeneity in individual-level models (Villas-Boas and Winer 1999, Villas-Boas and Zhao

2001) is the ability to model explicitly the heterogeneity structure of the population. We specify

and estimate a latent class model to incorporate unobserved heterogeneity across households.

In its current form, however, our method cannot readily take into account the panel structure of

the household-level data. That is, if there is a correlation in the tastes of individual households,

our procedure yields a partial likelihood and the estimated standard errors need to be corrected.

Extending the proposed methodology to explicitly incorporate the dependencies in households

choices over time is an important area for future research.14

On the supply side, one might think about the reasonability of the assumed Nash behavior in

prices. Our method does not require any particular assumption about the strategic interactions

24

between firms. A conjectural variation approach or a menu approach to test for different

behavioral assumptions could be employed to reveal the nature of competition in the market.

This is critical because misspecification of the supply side translates into a misspecified system,

thus leading to inconsistent parameter estimates. Future research could also focus on enriching

the supply side by explicitly incorporating the channel structure (Villas-Boas 2001, Sudhir

2001).

In the current analysis we only consider the endogeneity of prices to illustrate the proposed

methodology. Recent studies have suggested, however, that other strategic instruments such as

advertising (Vilcassim, Kadiyali and Chintagunta 1999) and product line length (Draganska

and Jain 2003) should also be considered endogenous. One fruitful venue for future study

would therefore be to apply the estimation procedure developed in this paper to the analysis

of other marketing mix instruments.

In sum, the present research is a first step towards the estimation of a market equilibrium

model with a disaggregate discrete choice model on the demand side and an oligopoly model

on the supply side. The proposed estimation procedure explicitly accounts for the price endo-

geneity problem. It further bears the potential of combining the advantages of simultaneous

estimation of market models with recent developments in incorporating richer heterogeneity

structures and more flexible error specifications in disaggregate models.

References

Ackerberg, D. and Gowrisankaran, G. (2001). Quantifying equilibrium network externalities

in the ACH banking industry, working paper, UCLA.

Anderson, S., de Palma, A. and Thisse, J. (1992). Discrete Choice Theory of Product Differ-

entiation, MIT Press, Cambridge, MA.

25

Berry, S. (1994). Estimating discrete-choice models of product differentiation, RAND Journal

of Economics 25: 242–262.

Berry, S., Carnall, M. and Spiller, P. (1997). Airline hubs: Costs, markups and the implications

for consumer heterogeneity, working paper, Yale University.

Berry, S., Levinsohn, J. and Pakes, A. (1995). Automobile prices in market equilibrium,

Econometrica 63: 841–890.

Besanko, D., Dube, J.-P. and Gupta, S. (2003). Competitive price discrimination strategies in

a vertical channel with aggregate data, Management Science 49(9): 1121–1138.

Besanko, D., Gupta, S. and Jain, D. (1998). Logit demand estimation under competitive

pricing behavior: An equilibrium framework, Management Science 44: 1533–1547.

Burnkrant, R. and Unnava, H. R. (1995). Effeccts of self-referencing on persuasion, Journal of

Consumer Research 22: 17–26.

Chintagunta, P., Dube, J.-P. and Goh, K.-Y. (2003). Beyond the endogeneity bias: The effect

of unmeasured brand characteristics on household-level brand choice models, Technical

report, University of Chicago GSB.

Chintagunta, P., Jain, D. and Vilcassim, N. (1991). Investigating heterogeneity in brand

preferences in logit models for panel data, Journal of Marketing Research 28: 417–428.

Draganska, M. and Jain, D. (2003). Product-line length as a competitive tool, working paper,

Stanford University.

Dube, J.-P. (2003). Discussion of ‘Bayesian analysis of simultaneous demand and supply’,

Quantitative Marketing and Economics 1(3). forthcoming.

26

Fader, P. and Hardie, B. (1996). Modeling consumer choice among SKUs, Journal of Marketing

Research 33: 442–452.

Gonul, F. and Srinivasan, K. (1993). Modeling multiple sources of heterogeneity in multinomial

logit models: methodological and managerial issues, Marketing Science 12(3): 213–229.

Goolsbee, A. and Petrin, A. (2003). The consumer gains from direct broadcast satellites and

the competition with cable television, Econometrica . forthcoming.

Guadagni, P. and Little, J. D. C. (1983). A logit model of brand choice calibrated on scanner

data, Marketing Science 2(3): 203–238.

Hardle, W. (1990). Applied Nonparametric Regression, Cambridge University Press.

Kamakura, W. and Russel, G. (1989). A probabilistic choice model for market segmentation

and elasticity structure, Journal of Marketing Research 26: 379–390.

Kennan, J. (1989). Simultaneous equation bias in disaggregated econometric models, Review

of Economic Studies 56: 151–156.

McFadden, D. (1989). A method of simulated moments for estimation of discrete response

models without numerical integration, Econometrica 57(5): 995–1026.

Nelder, J. and Mead, R. (1965). A simplex method for function minimization, Computer

Journal 7: 308–313.

Nevo, A. (2001). Measuring market power in the ready-to-eat cereal industry, Econometrica

69(2): 307–342.

Pakes, A. and Pollard, D. (1989). Simulation and the asymptotics of optimization estimators,

Econometrica 57: 1027–1057.

27

Petty, R. and Cacioppo, J. (1986). Communications and Persuasion: Central and Peripheral

Routes to Attitude Change, Springer Verlag.

Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (1993). Numerical Recipes in C: The

Art of Scientific Computing, 2 edn, Cambridge University Press.

Scott, D. (1992). Multivariate Density Estimation : Theory, Practice, and Visualization, Wiley

Series in Probability and Statistics, New York.

Silverman, B. (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall,

London.

Slade, M. (1995). Product rivalry and multiple strategic weapons: An analysis of price and

advertising competition, Journal of Economics and Management Strategy 4: 445–476.

Stone, C. (1980). Optimal rates of convergence for nonparametric estimators, Annals of Statis-

tics 8: 1348–1360.

Sudhir, K. (2001). Structural analysis of competitive pricing in the presence of a strategic

retailer, Marketing Science 20(3): 244–264.

Viard, B., Polson, N. and Gron, A. (2002). Likelihood based estimation of nonlinear equilibrium

models with random coefficients, working paper, Stanford University.

Vilcassim, N., Kadiyali, V. and Chintagunta, P. (1999). Investigating dynamic multifirm

market interactions in price and advertising, Management Science 45: 499–518.

Villas-Boas, M. and Winer, R. (1999). Endogeneity in brand choice models, Management

Science 45: 1324–1338.

Villas-Boas, M. and Zhao, Y. (2001). The ketchup marketplace: Retailers, manufacturers and

individual consumers, working paper, UC Berkeley.

28

Villas-Boas, S. (2001). Vertical contracts between manufacturers and retailers: An empirical

analysis, working paper, UC Berkeley.

Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data, MIT Press.

Yang, S. and Allenby, G. (2000). A model for observation, structural, and household hetero-

geneity in panel data, Marketing Letters 11: 137–149.

Yang, S., Chen, Y. and Allenby, G. (2003). Bayesian analysis of simultaneous demand and

supply, Quantitative Marketing and Economics 1(3): 1–25. forthcoming.

Yatchew, A. (1998). Nonparametric regression techniques in economics, Journal of Economic

Literature 36(2): 669–721.

29

Notes

1The authors wish to thank Arie Beresteanu, Ulrich Doraszelski, Jean-Pierre Dube, Gau-tam Gowrisankaran, Charles Manski, Mike Mazzeo, Brian Viard and participants at the 1999Marketing Science conference in Syracuse for their helpful comments and suggestions. MariuszRabus provided expert research assistance for this project.

2An anonymous referee drew our attention to the fact that our assumption is somewhatsimilar to what Yang and Allenby (2000) call ‘observation’ heterogeneity. Yang and Allenby(2000) define this term in the context of a latent class model as a specification in which thelatent class probabilities depend on observable covariates. This contrasts with ‘household’ or’structural’ heterogeneity, which entails dependence over time.

3The main drawback of a continuous distribution of consumer heterogeneity is its compu-tational complexity, since we need to numerically evaluate multidimensional integrals. Whilethis is also true in standard models (e.g., Berry et al. (1995)), our estimation algorithm isalready computational intense, so we prefer to work with a discrete distribution.

4For a lucid discussion of this approach, see Dube (2003).

5In small samples, most kernel density estimators are biased. Our Monte Carlo results indi-cate that this does not impair the ability of our procedure to recover the structural parametersof the equilibrium model. If unbiasedness is desired, one can use so-called higher-order kernels,which are computationally more demanding.

6Another possibility to obtain a smooth likelihood function has been explored by Ackerbergand Gowrisankaran (2001). The authors make the auxiliary assumption of normal measurementerror that allows them to express the likelihood function in terms of the normal density. Asimilar assumption has also been employed by Viard, Polson and Gron (2002) who estimate anequilibrium model using Bayesian methods (Markov Chain Monte Carlo techniques). Theseapproaches may be problematic if the underlying density of the endogenous variables differssignificantly from a normal density.

7For a thorough treatment the interested reader is referred to Yatchew (1998).

8Details on the estimation procedure available from the authors upon request.

9In the laundry detergent category, we use data for 107 weeks.

10Yoplait only offers single-serving size yogurt. Dannon also carries 16oz and 32oz of plainand vanilla yogurt in addition to single-serving size. It is often argued that these two particularflavors are used for cooking purposes and constitute a different market.

11Details are available from the authors upon request.

12We are grateful to an anonymous referee for bringing this point to our attention.

30

13These numbers are computed from the standard logit specification with no heterogeneity.

14In a recent article, Yang et al. (2003) propose a Bayesian approach to this problem.

31