learning from experience, simply - mitweb.mit.edu/hauser/www/hauser pdfs/lin_zhang_hauser...learning...
TRANSCRIPT
Learning from Experience, Simply
by
Song Lin
Juanjuan Zhang
and
John R. Hauser
June 1, 2012 Song Lin is a PhD Candidate, MIT Sloan School of Management, Massachusetts Institute of
Technology, E62-580, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 225-1639,
Juanjuan Zhang is an Associate Professor of Marketing, MIT Sloan School of Management,
Massachusetts Institute of Technology, E62-537, 77 Massachusetts Avenue, Cambridge, MA
02139, (617) 452-2790, [email protected].
John R. Hauser is the Kirin Professor of Marketing, MIT Sloan School of Management, Massa-
chusetts Institute of Technology, E62-538, 77 Massachusetts Avenue, Cambridge, MA 02139,
(617) 253-2929, [email protected].
Learning from Experience, Simply
1
Learning From Experience, Simply
Abstract
There is substantial interest in the marketing literature in modeling and estimating con-
sumer-learning dynamics. However, (approximately) optimal solutions to forward-looking learn-
ing problems are computationally complex, limiting their empirical applicability and behavioral
plausibility. Drawing on theories of cognitive simplicity from marketing, psychology, and eco-
nomics, we propose a behaviorally intuitive (and tractable) solution – index strategies. We argue
that index strategies balance thinking costs and discounted utility. Index strategies also avoid ex-
ponential growth in computational complexity as the size of the decision problem increases, ena-
bling researchers to study learning models in more complex situations.
The existence of index strategies depends upon a structural property called indexability,
which is hard to establish in general. We prove the indexability of canonical consumer learning
models in which both brand quality and future utility shocks are uncertain. We establish invari-
ance properties which make index strategies feasible for consumers to intuit. Using synthetic da-
ta, we demonstrate that index strategies achieve nearly optimal utility at low computational costs.
Using IRI data for a product category where we expect forward-looking learning, we find that an
index-strategy model fits behavior well, provides plausible parameter estimates, predicts out-of-
sample as well as or better than alternative models, and requires substantially lower computa-
tional costs.
Keywords: dynamic consumer learning, structural models, cognitive simplicity, index strate-
gies, heuristics, multi-armed bandit problems, restless bandits, indexability.
Learning from Experience, Simply
2
1. Introduction and Motivation
Considerable effort in marketing is devoted to modeling and estimating the dynamics by
which consumers learn from their consumption experience (e.g., Roberts and Urban 1988;
Erdem and Keane 1996; Erdem et al. 2005; Narayanan and Manchanda 2009; Ching and Ishiara
2010; Ching et al. 2011). Researchers have developed theory-rich models of optimizing forward-
looking consumers who balance exploitation (choosing the brand that yields the highest current
reward) with exploration (trying brands to gather information so that future consumption experi-
ences might improve). Pillars of these models are an explicitly specified description of consumer
utility and an explicitly specified process by which consumers learn. Most models assume con-
sumers choose brands by solving a dynamic program which maximizes expected total utility tak-
ing learning into account. Researchers argue that theory-based models are more likely to uncover
insight and be invariant for new-domain policy simulations (Chintagunta et al. 2006, p. 604).
However, these advantages often come at the expense of difficult and time-consuming solution
methods.
The dynamic programs for forward-looking learning models are, themselves, extremely
difficult to solve optimally. We cite evidence below that the problems are PSPACE-hard – they
cannot be solved using polynomial space (i.e., computation memory). PSPACE-hard implies the
more-familiar notion of NP-hard. This intractability presents both practical and theoretical chal-
lenges. Practically, problem difficulty requires researchers to rely on approximate solutions.
Without explicit comparisons to the optimal solution, we do not know the impact of the approx-
imations on estimation results. Moreover, the well-known “curse of dimensionality” prevents re-
searchers from investigating problems with moderate or large numbers of brands or marketing
variables, whereby even approximate solutions are not feasible. Theoretically, it is reasonable to
Learning from Experience, Simply
3
posit that a consumer cannot solve in his or her head a dynamic problem optimally that the best
computers cannot solve. In contrast, well-developed theories in marketing, psychology and eco-
nomics suggest that observed consumer decision rules are cognitively simple (e.g., Payne et al.
1988, 1993; Gigerenzer and Goldstein 1996).
We propose and evaluate an alternative theory of forward-looking consumers’ solutions
to the dynamic learning problems – index strategies. We retain the basic pillars of structural
modeling: an explicit description of consumer utility, an explicit Bayesian learning process, and
an assumption that consumers seek to optimize expected discounted utility. We posit in addition
a cost to thinking (e.g., Shugan 1980; Johnson and Payne 1985). We assume the consumer
chooses a strategy that is likely to optimize expected total utility minus thinking costs. While
thinking costs might be observable in the laboratory, say through response latency, they are in-
herently unobservable in vivo. To posit index strategies we identify domains where index strate-
gies are nearly optimal in the reduced problem of maximizing expected discounted utility. If, in
such domains, index strategies are substantially simpler for the consumer to implement, then it is
likely that savings in thinking costs exceed the slight reduction in optimality in the reduced prob-
lem. In the special cases where index strategies are optimal in the reduced problem, we argue
they are superior as a description of forward-looking consumers. Following the same logic, we
also establish conditions where myopic learning strategies suffice.
To prove the viability of index strategies as a descriptive model of consumers we must
(1) establish when well-defined index strategies exist, (2) provide intuition that they are cogni-
tively simple and behaviorally intuitive (and hence might be used by consumers), (3) investigate
when index strategies are optimal or near optimal solutions to the reduced problem of utility
maximization even if there were no thinking costs, and (4) test whether index strategies explain
Learning from Experience, Simply
4
observed consumer behavior at least as well as alternative models. We address #1 analytically
by proving the “indexability” property of canonical forward-looking learning models.
(Indexability is hard to establish in general.) We address #2 by examining the form and proper-
ties of index strategies and arguing they are cognitively simple and behaviorally intuitive relative
to current solution methods. We address #3 with both analytical arguments and synthetic data.
We address #4 by estimating alternative models using IRI data on the purchase of diapers, a
product category where we expect to see forward-looking learning. We begin with a review of
concepts upon which we build.
Our basic hypothesis is that consumers use a cognitively-simple index strategy. We
demonstrate viability by showing that the “Whittle index” (to be defined later) is one candidate
that satisfies the four criteria. Figure 1 is a conceptual summary of our hypothesis. The concept
of an index solution could apply to other indices (or approximations to the Whittle index) if such
indices are shown to be better descriptions of observed consumer behavior. Future papers might
evaluate such indices on the four criteria and compare their performance to the Whittle index.
[Insert Figure 1 about here.]
2. Related Literatures
We draw on concepts from four literatures: learning dynamics, cognitive simplicity, de-
scriptive models based on optimal solutions to reduced problems, and bandit problems.
2.1. Learning Dynamics
Many influential papers study consumer learning dynamics and apply learning models to
explain or forecast consumer choices. Using data from automotive consumers Roberts and Urban
(1988) estimate a model in which consumers use Bayesian learning to integrate information from
a variety of sources to resolve uncertainty about “brand quality.” They, like subsequent authors,
Learning from Experience, Simply
5
define brand quality generally either by a multi-attributed utility function or by a match of a
brand’s features to the consumer’s needs. Erdem and Keane (1996) build upon the concept of
Bayesian learning to include forward-looking consumers who tradeoff exploitation with explora-
tion. For frequently-purchased goods, their model fits data better than a purely myopic model
(reduced form of Guadagni and Little 1983) and as well as the Roberts-Urban myopic-learning
model. These papers stimulated a line of research that estimates the dynamics of consumer learn-
ing – for a recent review see Ching et al. (2011). Some models retain myopic consumers with
Bayesian learning (e.g., Narayanan et al. 2005; Mehta, et al. 2008; Chintagunta et al. 2009; Na-
rayanan and Manchanda 2009; Ching and Ishihara 2010), while others explicitly model forward-
looking consumers (e.g., Ackerberg 2003; Crawford and Shum 2005; Erdem et al. 2005, 2008;
Kim et al. 2010).1 Because forward-looking learning problems are computationally intractable,
many applications estimate models based on myopic learning (Narayanan and Manchanda 2009;
Ching and Ishihara 2010).
In general, forward-looking models fit empirical data well but have not improved predic-
tion much relative to myopic learning models. However, when the theory is accurately descrip-
tive, the more-complex models should improve policy simulations. Because the forward-looking
assumption requires consumers to solve computationally hard dynamic problems, some authors
have suggested that “the future development of structural models in marketing will focus on the
interface between economics and psychology (Chintagunta et al. 2006, p. 614).”
2.2. Cognitive Simplicity
Parallel literatures in marketing, psychology, and economics provide evidence that con-
sumers use decision rules that are cognitively simple. In marketing Bettman et al. (1998) and
1 Kim et al. (2010) model consumers’ sequential search of products. The search problem can be seen as a forward-looking learning problem in which a consumer’s product “experience” completely reveals product value.
Learning from Experience, Simply
6
Payne et al. (1988, 1993) present evidence that consumers use simple heuristic decision rules to
evaluate products. For example, under time pressure, consumers often use conjunctive rules (re-
quire a few “must have” features) rather than linear-utility-based rules. Using simulated thinking
costs with “elementary information processes,” Johnson and Payne (1985) illustrate how heuris-
tic decision rules can be rational when balancing utility and thinking costs. Methods to estimate
the parameters of cognitively-simple decision rules vary, but, in general, such rules predict as
well as or better than linear utility (e.g., Bröder 2000; Gilbride and Allenby 2004; Kohli and
Jedidi 2007; Yee et al. 2007; Hauser et al. 2010).
Building on Simon’s (1995, 1956) theory of bounded rationality, researchers in psychol-
ogy argue that human beings use cognitively simple rules that are “fast and frugal” (e.g.,
Gigerenzer and Goldstein 1996; Martignon and Hoffrage 2002). Fast and frugal rules evolve
when consumers learn decision rules from experience. Consumers continue to use the decision
rules because they lead to good outcomes in familiar environments (Goldstein and Gigerenzer
2002). For example, when judging the size of cities, “take the best” often leads to better judg-
ments than a linear rule.2 Indeed, in 2010-2011 two issues of Judgment and Decision Making
were devoted to the recognition heuristic alone (e.g., Marewski et al. 2010). Related concepts in-
clude accessibility (e.g., Bruner 1957), fluency (e.g., Jacoby and Dallas 1981), and availability
(e.g., Tversky and Kahneman 1973).
The costly nature of cognition has also received attention in economics (see Camerer
2003 for a review). A line of research looks to extend or revise standard dynamic decision-
making models with the explicit recognition that cognition is costly. For example, Gabaix and
Laibson (2000) empirically test a behavioral solution to decision-tree problems – decision-
2 The take-the-best rule is, simply, if you recognize one city and not the other it is likely larger; if you recognize both use the most diagnostic feature to make the choice.
Learning from Experience, Simply
7
makers actively eliminate low-probability branches to simplify the task. Gabaix et al. (2006) de-
velop a “directed cognition model,” in which a decision-maker acts as if he/she has only one
more opportunity to search. In the laboratory, the directed cognition model explains subjects’ so-
lutions to a simple dynamic problem better than a standard search model that assumes costless
cognition.
Cognitive process mechanisms are still being debated in the marketing, psychology, and
economics literatures. Our hypothesis that consumers use index or index-like strategies needs on-
ly the observation that consumers favor decision rules that are cognitively simple and that such
rules often lead to very good outcomes. The cognitive-simplicity hypothesis assumes that con-
sumers tradeoff utility gains versus thinking costs, but does not require explicit measurement of
thinking costs.
2.3. Descriptive Models Based on Optimal Solutions to Reduced Problems
If a ball player wants to catch a ball that is already high in the air and traveling directly
toward the player, then all the player need do is gaze upon the ball, start running, and adjust
his/her speed to maintain a constant gaze angle with the ball (Hutchison and Gigerenzer 2005, p.
102).3 The gaze heuristic is an example where a cognitively simple rule accomplishes a task that
might otherwise involve solving difficult differential equations. But the principle is more gen-
eral: complex optimization problems often have simple solutions.
Suppose a consumer knows the utilities and prices of a set of durable goods and wishes to
choose the maximum utility set subject to a budget constraint. If the brands were infinitely di-
visible, then simple “greedy” solutions lead to the optimal allocation: choose brands in the order
of either utility/price or utility – ×price (where is the shadow price of the budget constraint).
3 Professional athletes use more-complicated heuristics that give them greater range, for example, in baseball, pre-positioning based on prior tendencies and the expected pitch, and the sound as the bat hits the ball.
Learning from Experience, Simply
8
Do so until the budget is exhausted. When brands are not infinitely divisible neither heuristic is
optimal, but both greedy heuristics provide excellent (and empirically indistinguishable) descrip-
tions of consumer behavior (Hauser and Urban 1986). Greedy heuristics are one justification for
a utility specification that is linear in price. Another heuristic, choosing the information source
with the maximum gain in value per unit time while looking only one-step ahead, explains well
how consumers search for automobiles (Hauser et al. 1993). There are many examples in psy-
chology and marketing where seemingly simple decision rules solve more-complex problems.
We argue in §4 and §5 that some index strategies are simple decision rules that solve complex
dynamic optimization problems.
2.4. Bandit Problems
The multi-armed bandit problem is a prototypical problem that illustrates the fundamen-
tal tradeoff between exploration and exploitation in sequential decision making under uncertain-
ty. In a bandit problem the consumer faces a finite number of choices, each of which has an un-
certain value. The consumer must make choices, observe outcomes, and update beliefs with a se-
quential decision rule to maximize expected discounted values. First formulated by the British in
World War II, for over thirty years no simple solution was known. Then Gittins and Jones (1974)
demonstrated a simple index solution – develop an index for each “arm” (each choice alterna-
tive) by solving a sub-problem that involves only that arm, then choose the arm with the largest
index. Gittins and Jones (1974) proved the surprising result that the index solution is the optimal
solution to the classic bandit problem whenever the non-chosen choice alternatives do not
change over time4.
When non-chosen choice alternatives change over time, say due to random shocks, 4 The Gittins’ index has been successfully applied in a variety of fields. For example, Hauser et al. (2009) apply Gittins’ index to derive optimal “website morphing” strategies that match website design with customers’ cognitive styles. Recently morphing was applied to AT&T’s banner advertising on CNET.
Learning from Experience, Simply
9
Gittins’ index is no longer guaranteed to be optimal. Such problems are known as “restless ban-
dits” (Whittle 1988) and, in general, are computationally intractable (Papadimitriou and
Tsitsiklis 1999). In his seminal paper, Whittle (1988) proposed a tractable heuristic solution
grounded on the optimal solution to a relaxed problem. His solution generalizes Gittins’ index
such that the problem can be solved optimally or near optimally by associating an index (referred
to as Whittle’s index) separately with each alternative and choosing the alternative with the larg-
est index. This index solution reduces an exponentially complex, intractable problem to a set of
one-dimensional problems.
The existence of well-defined index solutions relies on a structural property called
indexability, which is not guaranteed for all restless bandit problems. We show that the canonical
forward-looking learning problem is indexable. We also show that the index of an alternative is a
simple function of key parameters pertaining to this brand, including means and variances of
brand quality, quality beliefs, and utility shocks. We then argue that it is reasonable for the con-
sumer to intuit how an index varies as a function of these parameters.
3. Canonical Forward-Looking Learning Problem
We consider the following canonical forward-looking learning problem. A consumer se-
quentially chooses from a set containing brands. Let index brands and index time. The
consumer’s utility, , from choosing at has three components. “Quality” (enjoyment, fit
with needs, weighted sum of brand features, etc.), , is drawn from a distribution ;
with parameters uncertain to the consumer. The distribution functions are independent over
. This independence assumption rules out learning about brand by choosing another brand.
Quality is realized after each consumption occasion. Conditionally on , quality draws
are independent over time; however, quality draws may be correlated over time through .The
Learning from Experience, Simply
10
interpretation is as follows. Because quality is a general concept that includes enjoyment and fit
with needs that might vary among consumption occasions, a single brand experience is not suffi-
cient to resolve all uncertainty. However, other things being equal, a good brand experience sug-
gests that future experiences with the same brand are likely to be good. This gives forward-
looking consumers the “exploration” incentive – by trying out a brand, consumers make better-
informed decisions about this brand in future.
The second component of utility is a set of “observable shocks”, , such as advertising,
price, promotion, and other control variables that are observable to the researcher and consumer.5
For simplicity, we assume the affect utility directly, although the model is extendable to indi-
rect effects as in Ackerberg (2003), Erdem and Keane (1996), and Narayanan et al. (2005). The
third component of utility is an “unobservable shock”, , which represents random preference
fluctuations observed by the consumer but not by the researcher.
We refer to the weighted sum of observable and unobservable shocks, ′ , as
“utility shocks,” where is a vector of weight parameters. Utility shocks are important because,
without them, the consumer would learn to choose a single brand (if the exogenous variables
stabilized to a known value, say constant price and advertising), an outcome that is often violated
in real-world observations. We let utility shocks be drawn from a joint distribution,
, ; , independently over time with parameters, .6 The are independent over . The
consumer knows the distribution and the value of , observes the current utility shocks prior
to his/her decision at , but does not know future realizations of the shocks. We make the con-
5 Our consumer learning model treats observable utility shocks as exogenous. However, the same insight applies to endogenous observable utility shocks as long as (1) each “atomic” consumer’s learning does not affect these shocks (e.g., a brand’s advertising expenditure), and (2) these shocks do not directly convey quality information. 6 Observable shocks can be independently distributed over time for a number of reasons. For example, firms may in-tentionally randomize price promotions in response to competition. Such “mixed strategies” can generate observed prices that appear to be freshly drawn in each period from a known distribution (Narasimhan 1988).
Learning from Experience, Simply
11
servative assumption that utility shocks are independent of and thus do not help consumers
learn quality directly. However, utility shocks do shape learning indirectly by varying consum-
ers’ utility from exploitation, which in turn affects their incentive for exploration.
In summary, we write the consumer’s utility from choosing at as follows:
(1) ′ .
The consumer uses Bayes Theorem to update his or her beliefs about the parameters, ,
after each consumption experience (assumed to occur after choice but before the next choice).
Let be a set of parameters that summarize the consumer’s beliefs about at time . At 0,
beliefs about are summarized by a prior distribution, ; where is based on all rel-
evant prior experience. After the consumption experience the consumer’s posterior beliefs
are summarized by ; . For example, when both and prior beliefs are normal, Bayesi-
an updating is naturally conjugate. We obtain , using standard updating formulae.
The parameters of posterior beliefs, ∈ Ω and the realized utility shocks, ∈ and ∈ ,
summarize the state of information about brand . The collection of brand-specific states,
, , , , … , , , , … , , , , … , represent the set of states relevant to
the decision problem at .
We seek to model a decision strategy, Π: Ω → , that maps the state space to
the choice set. Without further assumptions, the consumer must choose a decision strategy to
maximize expected discounted utility:
(2) , , max ′ , , ,
where is the discount factor and the expectation is taken over the stochastic process generat-
Learning from Experience, Simply
12
ed by the decision strategy (in particular, the transition between states that may depend on the
consumer’s brand choice), its implication for Bayesian updating, and the , , and distribu-
tion functions. The infinite horizon can be justified either by repeat consumption over a long
horizon or by the consumer’s subjective belief that he/she will terminate the decision problem
randomly.
The optimal solution to the consumer’s decision problem can be characterized as the so-
lution to Bellman’s equation.
(3) , , max∈
′ , ′ , | , .
While Bellman’s equation is conceptually simple, the full solution is computationally intractable
because, even after integrating out the utility shocks and , it evolves on a state space of size
|Ω| , where |Ω| is the number of elements in Ω. Not only is this exponential in the number of
brands, but problem dimensionality gets extremely large if the state space is large. Even when
the optimal solution is approximated by choosing discrete points to represent Ω, |Ω| is large.
4. An Index Strategy in the Absence of Utility Shocks
The learning problem we examine includes utility shocks, but it is easier to illustrate the
intuition of index strategies using a problem without utility shocks. Temporarily assume
0 for all and , although the same result holds when there is no inter-temporal vari-
ation in . In this special case the consumer’s decision problem is a classic multi-arm bandit.
Gittins’ insight is as follows. To evaluate a brand , the consumer thinks as if he/she is
choosing between this brand and a fixed reward , which, once chosen, is chosen for all future
periods. In each period , the consumer solves an independent sub-problem for each brand –
he/she either consumes this brand to gain more information about it, or exploits the fixed reward
Learning from Experience, Simply
13
. In the latter case, the consumer’s beliefs about the brand cease to evolve, such that ,
. The optimal solution to this sub-problem is determined by a greatly simplified version of
Bellman’s equation:
(4) , max , , , , | .
Notice that each sub-problem only depends on the state evolution of a single brand, which is
much simpler than the full problem specified in Equation 3.
Gittins’ index, , is defined as the smallest value of such that the consumer at
time is just indifferent between experiencing brand and receiving the fixed reward. We obtain
by equating the two terms in brackets in Equation 4. Gittins proposed that could be
used as a measuring device for the value of exploring brand – if there is more uncertainty about
a brand left to explore, the consumer will demand a higher fixed reward to be willing to stop ex-
ploration. Gittins’ index is updated when new information is realized.
Gittins’ surprising result is the Index Theorem. The optimal solution in each period is to
choose the brand with the highest index in that period. The consumer suffers no loss in expected
discounted utility by using an index strategy. A computationally difficult problem has thus been
decomposed into simpler sub-problems.
Index Theorem (Gittins and Jones 1974). The optimal decision strategy when there are
no utility shocks is Π ∈ .
Figure 2 illustrates the Index Theorem. In Figure 2a we computed Gittins’ indices for two
brands (for normal and normal priors). The true mean quality for Brand 1 is normalized to ze-
ro and the true mean quality of Brand 2 is negative. The consumers’ mean prior beliefs reflect
Learning from Experience, Simply
14
the true mean qualities, but the consumer is uncertain about these beliefs.
Prior to period 7 the index strategy causes the consumer to bounce between choosing
Brand 1 and Brand 2. In periods 7 to 12, he/she continues to sample Brand 2 until by period 13
he/she has learned enough about the brands. After period 13 the consumer switches to Brand 1
and remains with Brand 1 indefinitely. A brand’s index remains flat when it is not being con-
sumed because the simplified problem does not include utility shocks.
[Insert Figure 2 about here.]
Index strategies are much simpler than a brute force solution to Bellman’s equation, but
can the consumer intuit (perhaps approximately) how an index varies with experience and be-
liefs? We expect future laboratory experiments to address this issue explicitly. In this paper, we
argue that index strategies have intuitive properties and that it is not unreasonable for the con-
sumer to intuit those properties.
Figure 2b illustrates the intuitive properties of Gittins’ index. We plot the expected value
of Gittins’ index as it would evolve if the brand were chosen repeatedly (assuming the mean of
the prior distribution was accurate). The two lines represent two brands that differ in mean quali-
ty but not uncertainty. In each given period, Gittins’ index is larger for the higher-quality brand.
Naturally, with the same amount of remaining uncertainty, a higher-quality brand offers a greater
value of exploitation. Both index curves decline smoothly and converge with experience toward
true brand quality. The index converges because the value of exploration decreases as the con-
sumer learns more about the distribution of brand quality. The amount by which the index ex-
ceeds the mean quality beliefs is the value of learning (the value of reducing uncertainty).
When we plot Gittins’ index as a function of the consumer’s posterior quality uncertainty
(not shown), it is also intuitive – the index increases with because the value of exploration
Learning from Experience, Simply
15
increases with the remaining amount of quality uncertainty. We therefore posit that it is not un-
reasonable for the consumer to intuit the (approximate) shape of Gittins’ index as a function of
the parameters of the problem.
5. The Index Strategy When Utility Shocks are Present
We now allow utility shocks. Observable shocks ( ) include effects that researchers
might observe and model, such as changes in advertising, promotion, or price. Unobservable
shocks ( ) include effects that researchers do not observe, such as changes in the characteristics
of the consumption occasion or changes in the idiosyncratic taste. Because shocks enter the utili-
ty function the consumer may, at any period, switch among brands. Without utility shocks the
consumer’s choice converges to a single brand as in Figure 2a.
When the model includes utility shocks, the Gittins-Jones Index Theorem no longer ap-
plies because non-chosen brands do not remain constant. With shocks, the consumer’s problem
belongs to the class of “restless-bandit problems” as introduced by Whittle (1988). In general
such optimization problems are PSPACE-hard (Papadimitriou and Tsitsiklis 1999) making the
problem extremely difficult (if not infeasible) to solve and making it implausible that the con-
sumer would be able to solve the problem with a solution strategy based on Equation 3.
Whittle (1988) proposed a solution that generalizes Gittins’ index. In each period, to
evaluate a brand , the consumer thinks as if he or she must choose between brand and a fixed
reward, . Bellman’s equation for the sub-problem (which now includes the utility shocks)
is:
Learning from Experience, Simply
16
(5) , , ,
max , , , , , , , , , , , , | .
The index is defined as the smallest value of such that the consumer at time is just indiffer-
ent between experiencing brand and receiving the fixed reward. For such an index to be well-
defined and meaningful, the “indexability” condition needs to be satisfied (Whittle 1988). Let
⊆ Ω be the set of states for which choosing at time is optimal:
(6) , , ∈ Ω : , , , , , ,
′ , , , , , , | .
Indexability is defined as:
Definition: A brand is indexable if, for any , ⊇ for any .
Indexability says that as the fixed reward increases, the collection of states for which the
fixed reward is optimal does not decrease. Intuitively, indexability requires that, if under some
state it is optimal to choose the fixed reward, then it must also be optimal to choose a higher
fixed reward. Indexability implies a consistent ordering of brands for any state, so an index strat-
egy is meaningful. Because this condition does not hold for all restless-bandit problems, we must
establish indexability when the model includes utility shocks. In a companion online appendix
we prove the following proposition.
Proposition 1. The canonical forward-looking learning problem is indexable.
Learning from Experience, Simply
17
Once the indexability condition is established, then a well-defined index strategy is to
choose in each period the brand with the largest index. The index strategy breaks “the curse of
dimensionality” by decomposing a problem with exponential complexity into much-simpler
sub-problems, each on a state space of |Ω| after integrating out the utility shocks and . By
breaking this curse, it is more plausible that the consumer might use the decision strategy. As a
bonus, estimation is far simpler when a dynamic learning problem is nested. It remains to be
shown that the index strategy with utility shocks is invariant to scale, intuitive, and implies a rea-
sonable utility vs. thinking cost tradeoff.
5.1. The Index Strategy is Invariant to Scale and Behaves Intuitively
An index strategy would be difficult for the consumer to use if the strategy were not in-
variant to scale. If it is invariant the consumer can intuit (or learn) the basic shape of the index
function and use that intuited shape in many situations. Invariance facilitates ecological rationali-
ty. The following results hold for fairly general distribution of quality, ; , and joint dis-
tribution of utility shocks and , as long as they have scale and location parameters and the
quality belief ; is conjugate. To ease interpretation, we assume that and are nor-
mal distributions with parameters as defined earlier: and for quality; and for posteri-
or beliefs about quality; and , and , for utility shocks. In a companion online appendix we
prove the following proposition.
Proposition 2. Let be Whittle’s index for the canonical learning problem computed when
the posterior mean quality ( ) is zero, the mean utility shock ( , ) is zero, and the inherent
variation of quality ( ) is 1. For forward-looking consumers, Whittle’s index is scalable in
these parameters. That is:
Learning from Experience, Simply
18
, , ′ , , , , , ,
, 0, ,,
, 1, 0,,
, .
Proposition 2 implies that the consumer can simplify his or her mental evaluations by de-
composing the index for each brand into (1) the mean utility gained from “myopic learning”,
, , which reflects the exploitation of posterior beliefs, and (2) the incremental benefit of
looking forward, , which captures quality information gained through exploration. The con-
sumer needs only intuit the shape of for a limited range of parameter values and scale it by .
To provide further intuition, we prove the following proposition in an online appendix.
The proposition shows that Whittle’s index behaves as expected when the parameters of the
problem vary. The consumer likes increases in quality and utility shocks, dislikes inherent uncer-
tainty in quality and utility shocks, but values uncertainty in beliefs about mean quality because
such uncertainty increases the value of learning about that alternative.
Proposition 3. Whittle’s Index for the canonical learning problem is (1) increasing in
the posterior mean of quality ( ), the observable utility shocks ( ′ ), and the unob-
servable (to the researcher) utility shock ( ), (2) weakly decreasing in the inherent un-
certainty in quality ( ) and the uncertainty in the utility shocks ( , ), and (3) weakly in-
creasing in the consumer’s posterior uncertainty about quality ( ).
Figure 3 illustrates Whittle’s index where we set the posterior mean of quality to zero.
(We observe similar shapes of Whittle’s index for other parameter values.) Like Gittins’ index,
Whittle’s index is a smooth decreasing function of experience (experience reduces posterior
quality uncertainty). With sufficient experience, Whittle’s index converges toward zero implying
Learning from Experience, Simply
19
that asymptotically the value of a brand is based on the posterior mean of quality (Proposition 2).
Unlike Gittins’ index, Whittle’s index is a function of the magnitude of utility shocks ( , ). As
the magnitude of utility shocks gets larger, it is less important for the consumer to learn about
quality and the value of the index deceases as shown in Figure 3. These properties (and the shape
of the curve itself) are intuitive.
[Insert Figure 3 about here.]
Figure 3 and Proposition 3 suggest that, other things being equal, when the magnitude of
utility shocks is larger, then the realized utility shocks are more likely to be the deciding factor in
consumers’ brand choices. When , 5 in Figure 3, the Whittle-index curve is almost flat im-
plying an almost myopic strategy. To formalize this insight, we state the following Corollary to
Proposition 3:
Corollary. (1) When the consumer’s posterior quality uncertainty dominates the uncer-
tainty in utility shocks, the value to the consumer from looking forward is high. (2) When
the uncertainty in utility shocks dominates the consumer’s posterior quality uncertainty,
the value from looking forward is low. In this case, a myopic leaning strategy (i.e., ex-
ploiting posterior beliefs) suffices, and could be the optimal strategy if it requires lower
thinking costs than a forward-looking learning strategy.
6. Examination of the Near Optimality of an Index Strategy (Synthetic Data)
We now examine whether an index strategy implies a reasonable tradeoff between opti-
mality and thinking costs. Thinking costs remain unobservable, but §4 and §5 suggest that think-
ing costs could be substantially smaller with an index strategy compared to the direct solution of
the PSPACE-hard version of Bellman’s equation. It remains to show that the loss in utility is
small. To examine this issue we switch from analytic derivations to synthetic data because the
Learning from Experience, Simply
20
loss in utility is an issue of magnitude rather than direction. Synthetic data establish existence
(rather than universality) of situations where index strategies are close to optimal.
We set the to zero and examine near optimality for the special case when and
are normal distributions. We compare four decision strategies that the consumer might use.
1. No Learning. In this naïve strategy the consumer chooses the brand based only on his or
her prior beliefs of quality and the current utility shocks. This strategy provides a baseline
to evaluate the incremental value of learning.
2. Myopic Learning. In this strategy the consumer chooses the brand based only on his or
her posterior quality beliefs and the current utility shocks. This strategy exploits the con-
sumer’s posterior knowledge about brand quality. The Corollary predicts that this strate-
gy will suffice when the magnitude of the utility shock is relatively high.
3. Index Strategy. This strategy assumes the consumer can intuit the shape of Whittle’s in-
dex. As per Proposition 2, this strategy improves on the myopic-learning strategy to take
into account the option value of learning. Brand choices now reflect the consumer’s
tradeoff between exploitation and exploration. The Corollary predicts the index strategy
will outperform myopic learning especially when the magnitude of the utility shocks is
relatively low.
4. Approximate Optimality. The PSPACE-hard forward-looking learning problem cannot
be solved optimally, hence researchers use approximate solutions. Although approxima-
tion methods vary in the literature, discrete optimization is representative and should
converge to the optimal solution with finer grids (Rust 1996). We discretize the state
space, , into a set of grid points for each of brands.
We choose parameters that illustrate the phenomena and we expect they are empirically
Learning from Experience, Simply
21
reasonable. The approximate optimal solution requires a finite time horizon; we select 50
periods. If the discount factor is 0.90, truncation to a finite horizon should be negligible and,
in any case, is biased against index strategies. We choose 200 which should be very close to
optimal in the continuous problem. To simplify integration we draw the utility shocks from a
Gumbel distribution with parameters , and normalize the location parameter such that the
utility shocks have zero unconditional means. Setting 2 is sufficient to examine optimality
and makes the approximately optimal solution feasible – it evolves on a state-space of .
In this sense, the two-brand case provides a conservative test of the relative cognitive simplicity
of index strategies. The ongoing uncertainties in quality for both brands, , are equal and nor-
malized to 1.
We vary the parameter values to capture three possibilities: (1) the mean and uncertainty
both favor one brand, (2) the means are the same but uncertainty favors one brand, and (3) the
mean and uncertainty favor different brands. Because quality beliefs are relative and because we
can interpret Brand 1 and Brand 2 interchangeably, we need vary only the prior means in quality
beliefs for one brand. Therefore we fix the prior mean quality belief of Brand 1 to zero and vary
the prior mean quality beliefs for Brand 2 relative to Brand 1. We normalize the standard devia-
tion of Brand 2’s prior quality belief to one and let the standard deviation of Brand 1’s prior
quality belief be 1/2. While other relative variations are possible, these variations suffice to il-
lustrate the phenomena. Finally, to test the predictions in §5 we allow the uncertainty in shocks
to vary from relatively small to relatively large.7
We compute the consumer’s expected total utilities for 50 periods under different deci-
7 Specifically, we normalize 0 and 1 2 . We vary the means of the brand with greater prior variance with ∈ 0.3, 0.0, 0.3 . We vary the relative uncertainty in utility shocks with ∈0.1, 1.0 .
Learning from Experience, Simply
22
sion strategies. Details are provided in a companion online appendix. The results are summarized
in Table 1.
[Insert Table 1 about here.]
We first examine cumulative computation times which are surrogates for thinking costs.
As expected, the no-learning and myopic-learning strategies impose negligible thinking costs,
the index strategy has moderate thinking costs, and the approximate optimal solution requires
substantial thinking costs – 600 times the thinking costs under an index strategy even for this
simple problem (the number will be even greater with finer grid points).
We first examine the consumer’s expected utilities when there is relatively low magni-
tude of utility shocks (upper panel of Table 1). In all cases examined, the no-learning strategy
leads to the lowest utility, which suggests that learning is valuable. As predicted by the Corol-
lary, the index strategy and the approximately optimal strategy generate significantly higher utili-
ty than myopic learning for this problem. Furthermore, the index strategy is indistinguishable
from the approximately optimal strategy. As long as thinking costs matter even a little, an index
strategy will be better on utility minus thinking costs.
We next examine the case of relatively high magnitude of utility shocks (lower panel of
Table 1). As predicted by the Corollary, the myopic-learning model performs virtually the same
as either the index strategy or the approximately optimal strategy. The differences are not signif-
icant. In this case, the consumer might achieve the best utility minus thinking costs with a myop-
ic strategy (among the models tested).
Analysis of synthetic data never covers all cases. Table 1 is best interpreted as providing
evidence that (1) there exist reasonable situations where an index solution is better than the ap-
proximately optimal solution on utility minus thinking costs and (2) there exist domains where
Learning from Experience, Simply
23
myopic learning is best on utility minus thinking costs. We now examine empirical data.
7. Field Estimation of an Index Strategy (IRI Data on Diaper Purchases)
We seek to identify at least one situation where consumers are likely to be forward-
looking and, in that situation, examine whether an index solution fits (predicts) as well or better
than an approximately optimal solution. Even if an index solution does no better than an approx-
imately optimal solution, we consider the result promising because an index solution is cogni-
tively simpler. As a test of face validity, we also expect learning strategies to outperform no-
learning strategies and forward-looking strategies to outperform myopic-learning strategies when
the situation favors forward-looking behavior.
7.1. IRI Data on Diaper Purchases
We select the diaper category from the IRI Marketing Dataset that is maintained by the
SymphonyIRI Group and available to academic researchers (Bronnenberg, et al. 2008). Diaper
consumers are likely to be learning and forward-looking. Parents typically begin purchasing dia-
pers based on a discrete birth event. Even if the birth is a second or subsequent child, “quality”
may have changed. Informal qualitative interviews suggest that parents learn about whether dia-
per brands match their needs through experience (often more than one purchase), that diapers are
sufficiently important that parents take learning seriously, and that parents often try multiple
brands before settling on a favorite brand. There are observable shocks due to price promotions
and shocks due to unobservable events. For example, a baby might go through a stage where a
different brand is best suited to the parent/child’s needs. Diapers also have the advantages of be-
ing regular purchases (the no-choice option is less of a concern) and consumers tend to be in the
market for many purchase occasions.
To isolate a situation favoring forward-looking behavior we focus on consumers who
Learning from Experience, Simply
24
purchase branded products and whose purchases are likely triggered by a birth event. To do so,
we eliminate private-label consumers and we select consumers whose first purchase occurs 30
weeks after the start of data collection. After applying these screening criteria the data contain
1,385 households who make 8,089 purchases.
The market is dominated by three brands, Pampers, Huggies, and Luvs. We aggregate all
other purchases as “Other Brands.” As a first-order view, Table 2 compares switching behavior
during the first eight purchases to switching behavior after the first eight purchases. Brand loyal-
ty is higher after eight periods than within the first eight periods suggesting that consumers may
learn about “quality” over time. Notice also the small (and decreasing) switching to “Other
Brands” among major-brand consumers. Finally, although the category was chosen as a likely
test-bed for consumer learning, high brand loyalty, even during the initial eight weeks, suggests
that there is no guarantee a forward-looking strategy will fit the data.
[Insert Table 2 about here.]
For this initial test of an index solution, we limit the to the observed price for brands
purchased and the average price across other panelists in the same period for brands not pur-
chased. We randomly select 150 households for estimation and 200 households for validation8.
7.2. Empirical Specification
We index each of households by and denote by household ’s time horizon. We as-
sume that the quality and quality-belief distributions, and , are normal and that unobserva-
ble shock distributions are Gumbel. The decision strategies are given below ( and are now
scalars.)
8 The actual sample sizes for estimation and validation are 138 and 189 because we drop a few households whose order of brand purchases is unobserved.
Learning from Experience, Simply
25
No Learning: Π ,
Myopic Learning: Π ,
Index Strategy: Π , 0, ,,
, 1, 0,,
, ,
Approximately Optimal: Π , , , , , | , .
7.3. Issues of Identification
Although we would like to identify all parameters of the various models, we cannot do so
from choice data because utility is specified only to an affine transformation, and because the pa-
rameters that matter are relative parameters. For the no-learning model we can identify only the
relative means of prior beliefs. For the myopic-learning model we can identify only the relative
means of prior beliefs, the relative uncertainties of prior beliefs, and the means of quality. For the
no-learning and myopic-learning models time discounting does not matter.
For the index strategy and approximately optimal strategy we set the mean of prior be-
liefs of one brand ( ) to zero and normalize the variance of quality ( ) to one to set the scale
of quality. (Only ⁄ matters.) We cannot simultaneously identify a brand-specific mean of
quality and a brand-specific mean of the unobservable shock, so we set the latter to zero (
0). We can identify , because we have fixed . The standard deviation of is observed in
the data and, hence, is computable from , because the observable and unobservable shocks
are independent. As in most dynamic discrete choice processes (Rust 1994), the discount factor
is not identified; we set it to 0.90.9
Finally, as in Erdem and Keane (1996), we suppress “parameter heterogeneity” among
9 As expected, sensitivity analyses with other discount rates (e.g., 0.95 and 0.99) yield almost identical log-likelihood statistics and similar parameter estimates for the index-strategy model. Anticipating the results of § 7.5, we expect similar (lack of) sensitivity for the approximately optimal strategy. The ease with which such sensitivity checks can be run is a benefit of the computational tractability of the index-strategy model.
Learning from Experience, Simply
26
households. We continue to allow each household’s quality perceptions to evolve idiosyncrati-
cally, but we do not attempt to estimate heterogeneity in prior beliefs, in the long-term mean
quality, or in the standard deviations. We abstract away from parameter heterogeneity for the fol-
lowing reasons. First, there are, on average, only 8.39 purchases per household. We would overly
strain the model by attempting to estimate heterogeneity in all of the parameters.10 Second, we
wish to focus on behavioral heterogeneity that is generated endogenously by forward-looking
learning models. Even if households start with exogenously homogeneous prior beliefs, different
quality draws and utility shocks lead to different exploitation-versus-exploration tradeoffs and
different learning dynamics. We seek to evaluate heterogeneous learning dynamics against the
data, rather than using heterogeneous parameters to fit the data. For an initial test of an index
strategy, this simplification is conservative because it biases against a good model fit. Despite
this focus, it turns out the index strategy explains the data well and predicts well.
7.4. Maximum Simulated Likelihood Estimation
We estimate each model’s parameters with maximum simulated likelihood estimation. To
simplify notation let denote the vector of parameters to be estimated. Let ∈ denote
household ’s decision at time and let denote ’s decision sequence up to time .
The likelihood of observing the choice sequences as a function of is:
(7) Pr ; .
Learning strategies depend upon the evolution of the unobserved belief states complicat-
ing the inference process. If we were to write the likelihood function as a function of each con-
10 Doing so is technically feasible, but would likely over-parameterize the model and exploit noise in the data. More importantly, our goal is to demonstrate that an index solution is a viable representation of cognitive simplicity and that cognitive simplicity (relative to the approximately optimal solution) is a phenomenon worth study in structural models. We leave explicit modeling of parameter heterogeneity to future research using data with longer purchase histories.
Learning from Experience, Simply
27
sumer’s unobserved belief states and shocks over time, we would need to sample from an ex-
tremely complicated joint density of belief states and shocks. Instead, we augment the data and
sample directly from the more-fundamental unobservables – the quality experiences, , that
are drawn conditionally i.i.d. from normal distributions with means, , and standard deviations,
. Given a set of quality experiences and a set of prior beliefs, we obtain the unobserved belief
states, , , by conjugate updating formulae:
(8) , and
,
where ∑ is the cumulative number of purchases of brand by consumer
up to and including period . (We use ∙ as an indicator function.) We use ∑ to
denote the average quality experience observed by the consumer up to and including period .
We introduce vector notation to simplify exposition. Let be the vector (over ) of mean
qualities, let be the vector (over ) of the standard deviations of quality draws, and let be the
vector (over ) of the standard deviations of the unobservable shocks. Let the sequence of quality
draws up to and including period be . And let and be the vectors (over ) of
prices and unobservable utility shocks. Let ∙ be a probability density function. Then the like-
lihood for household is given by:
(9) Π , , , ; ; , .
To compute the likelihood we integrate over quality draws and unobservable shocks. To
integrate numerically we sample sequences of quality draws (each sequence has draws for
consumer ) from a multivariate normal distribution with parameters and . We assume the
Learning from Experience, Simply
28
unobservable shocks follow zero-mean Gumbel distribution with homogenous variance for all
brands. This assumption allows us to use the well-known logit formula to substantially simplify
the computation of choice probabilities for all models, and the computation of value function
used in approximately optimal strategies (Rust 1994). To use the logit formula for the index
strategy, we linearize the index function as a linear function of the unobserved shocks , while
preserving monotonicity:
Π , 0, ,
,
, 1, 0,
,
, .
7.5. Estimation Results
Table 3 summarizes the fit statistics for the 651 diaper purchases in the in-sample estima-
tion and the 676 purchases in the out-of-sample validation. is an information-theoretic meas-
ure that calculates the percent of uncertainty explained by the model (Hauser 1978); AIC and
BIC attempt to correct the likelihood function based on the number of parameters in the in-
sample estimation, BIC more so than AIC. (There are no free parameters in the out-of-sample
validation.)
[Insert Table 3 about here.]
First, on all measures there are sizable gains to learning – all learning models explain and
predict brand choices substantially better than the no-learning strategy. Second, the index strate-
gy improves in-sample fit and out-of-sample predictions relative to myopic learning. The likeli-
hood is significantly better ( 22, 0.002 in-sample; 13.52, 0.009 out-of-
sample), although the percent improvement in is not large. This result is consistent with our
expectation that diaper buyers are forward-looking. Third, the index strategy outperforms the ap-
Learning from Experience, Simply
29
proximately optimal solution for in-sample fit and out-of-sample predictions, although the differ-
ences are small. (The models are not nested so a test does not apply.) This empirical result is
consistent with the synthetic-data analysis. When two forward-looking strategies yield similar
expected utilities, we expect them to be statistically indistinguishable.
Table 4 summarizes the estimated parameter values. Across all learning models, all four
brands increase substantially in mean quality relative to prior beliefs, which implies that diaper
buyers learn by experience. Results are consistent with the switching patterns in Table 2.
[Insert Table 4 about here.]
Forward-looking models identify the uncertainty in utility shocks relative to ongoing
quality uncertainty (last panel of Table 4). Because the relative shock uncertainty varies across
brands, the index curve implies different behavior than myopic-learning for those brands. This
explains why forward-looking models fit and predict better. For example, Huggies has lower rel-
ative shock uncertainty than other brands providing greater incentives for consumers to explore
Huggies (as per the Corollary). Because the myopic-learning model ignores this difference, it
compensates by underestimating the relative mean prior beliefs. Managerially, Huggies has a
higher mean quality than Pampers or Luvs, but also higher ongoing relative uncertainty in quali-
ty across consumption. (The table reports the ratio of shock uncertainty to quality uncertainty – a
smaller number means higher relative quality uncertainty.)
Computational time in the embedded optimization problem is a rough surrogate for think-
ing costs. The last row of Table 3 reports the time per iteration of each model. Because the index
strategy is substantially faster than the approximately optimal strategy, it is reasonable to posit
that consumers view the index strategy as having lower thinking costs. (The myopic-learning
strategy is even faster and is a reasonable model for categories in which forward-looking is less
Learning from Experience, Simply
30
likely as was predicted by the Corollary.)
Both the index-strategy and the approximately-optimal strategy lead to similar parameter
estimates. Parameter estimates of either model are usually within confidence regions of the alter-
native model. This result is consistent with the synthetic-data analyses that suggest that both
strategies lead to near-optimal utility. If we accept cognitive simplicity as a paradigm, then the
index strategy is preferred as a more plausible description of consumer behavior.
In summary, using IRI data on diaper purchases we find that (1) learning models fit and
predict substantially better than the no-learning model; (2) forward-looking learning models fit
and predict significantly better than the myopic-learning model; (3) the index strategy and the
approximately optimal solution achieve similar in-sample fit and out-of-sample forecasts, as well
as reasonably close parameter estimates; and (4) computational (and thinking) costs favor the in-
dex-strategy model relative to the approximately optimal model.
8. Summary, Conclusions, and Future Research
Models of forward-looking consumer learning are important to marketing. These theory-
driven models examine how consumers make tradeoffs between exploiting and exploring brand
information. Managerially, these models enable researchers to investigate effects due to ongoing
quality variation, Bayesian learning, and the variation in utility shocks. However, the optimal so-
lution (assuming no thinking costs) requires an assumption that consumers solve in their heads
an optimization problem that is PSPACE-hard. Cognitive simplicity is a plausible theory that is
rooted in consumer behavior, psychology, and economics. Cognitive simplicity assumes that
consumers solve the meta problem that maximizes utility minus thinking costs.
In this paper we propose and evaluate an alternative theory of learning – that consumers
use an index strategy when looking forward. We prove analytically that an index strategy exists
Learning from Experience, Simply
31
for canonical consumer learning models and that the index function has simple properties that
consumers can intuit. Using synthetic data we demonstrate that an index solution achieves near
optimal expected utility, and is fast to compute. Using IRI data on diaper purchases, we show
that at least one index solution fits the data and predicts out-of-sample significantly better than
either a no-learning model or a myopic-learning model. The index strategy produces estimation
results (and hence managerial implications) that are quite similar to an approximately optimal so-
lution. The index solution has significantly lower computational costs and, we believe, is more
likely to describe consumer behavior.
We address many issues, but many issues remain. We abstract away from risk aversion,
advertising as a quality signal (the IRI dataset for the diaper category does not track advertising),
and inventory problems. Theoretically, risk aversion does not affect the indexability result. The
consequence of incorporating advertising signals depends on how consumers learn. Although
previous work has not found significant inventory effects (Ching et al. 2011), it is theoretically
interesting to extend index strategies to capture inventory concerns.
Finally, diaper buyers are likely forward-looking, but consumers in other product catego-
ries may not be. Our theory suggests that consumers are most likely to be forward-looking when
shock uncertainty is small compared to quality uncertainty; we expect myopic-learning models to
do well when shock uncertainty is large. An index solution appears to be a reasonable tradeoff
for diaper consumers, but other cognitively-simple solutions might do even better. Future re-
search can explore these solutions using either field data or laboratory experiments.
Learning from Experience, Simply
32
References
Ackerberg, D. A. 2003. Advertising, learning, and consumer choice in experience good markets:
an empirical examination. International Economic Review, 44 (3), 1007–1040.
Bettman, J. R., M. F. Luce, and J.W. Payne. 1998. Constructive consumer choice processes.
Journal of Consumer Research, 25 (3), 187–217.
Brezzi, M. and T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. Jour-
nal of Economic Dynamics and Control, 27 (1), 87-108.
Bröder, A. 2000. Assessing the empirical validity of the “take the best” heuristic as a model of
human probabilistic inference. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 26, 5, 1332-1346.
Bronnenberg, B. J., M. W. Kruger, and C. F. Mela. 2008. The IRI marketing data set. Marketing
Science, 27 (4), 745 – 748.
Bruner, J. S. 1957. On perceptual readiness. Psychological Review, 64, 123–152.
Camerer, C. F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction
(Roundtable Series in Behavioral Economics). Princeton University Press.
Chang, F. and T. L. Lai. 1987. Optimal stopping and dynamic allocation. Advances in Applied
Probability, 19 (4), 829-853.
Ching, A., T. Erdem, and M. P. Keane. 2011. Learning models: an assessment of progress, chal-
lenges and new developments. SSRN eLibrary.
Ching, A. and M. Ishihara. 2010. The effects of detailing on prescribing decisions under quality
uncertainty. Quantitative Marketing and Economics, 8, 123–165.
Chintagunta, P., T. Erdem, P. E. Rossi and M. Wedel. 2006. Structural modeling in marketing:
review and assessment, Marketing Science, 25 (6), 604-616.
Learning from Experience, Simply
33
Chintagunta, P., R. Jiang and G. Z. Jin. 2009. Information, learning, and drug diffusion: the case
of Cox-2 inhibitors. Quantitative Marketing and Economics, 7 (4), 399–443.
Gilbride, T. J. and G. M. Allenby. 2004. A choice model with conjunctive, disjunctive, and com-
pensatory screening rules. Marketing Science, 23 (3), 391-406.
Crawford, G. S. and M. Shum. 2005. Uncertainty and Learning in pharmaceutical demand.
Econometrica, 73 (4), 1137–1173.
Erdem, T. and M. P. Keane. 1996. Decision-making under uncertainty: capturing dynamic brand
choice processes in turbulent consumer goods markets. Marketing Science, 15 (1), 1–20.
Erdem, T., M. P. Keane, and B. Sun. 2008. A dynamic model of brand choice when price and
advertising signal product quality. Marketing Science, 27 (6), 1111 – 1125.
Erdem, T., M. P. Keane, T. S., and J. Strebel. 2005. Learning about computers: an analysis of in-
formation search and technology choice. Quantitative Marketing and Economics, 3, 207–
247.
Gabaix, X., and D. Laibson. 2000. A boundedly rational decision algorithm. American Economic
Review, 90 (2), Papers and Proceedings, 433-438.
Gabaix, X., D. Laibson, G. Moloche, and S. Weinberg. 2006. Costly information acquisition: ex-
perimental analysis of a boundedly rational model. American Economic Review, 96 (4),
1043–1068.
Gigerenzer, G. and D. G. Goldstein. 1996. Reasoning the fast and frugal way: models of bound-
ed rationality. Psychological Review, 103 (4), 650–669.
Gittins, J. 1989. Bandit Processes and Dynamic Allocation Indices. John Wiley & Sons, Inc.:
New York, NY.
Learning from Experience, Simply
34
Gittins, J. and D. Jones. 1974. A dynamic allocation index for the sequential design of experi-
ments. In J. Gani (Ed.), Progress in Statistics, North-Holland. Amsterdam, NL, 241–266.
Goldstein, D. G. and G. Gigerenzer. 2002. Models of ecological rationality: the recognition heu-
ristic. Psychological Review, 109 (1), 75 – 90.
Guadagni, P. M. and J. D. C. Little. 1983. A logit model of brand choice calibrated on scanner
data. Marketing Science, 2 (3), 203–238.
Hauser, J. R. 1978. Testing the accuracy, usefulness and significance of probabilistic models: an
information theoretic approach. Operations Research, 26 (3), 406-421.
Hauser, J. R., O. Toubia, T. Evgeniou, D. Dzyabura, and R. Befurt 2010, cognitive simplicity
and consideration sets. Journal of Marketing Research, 47, (June), 485-496.
Hauser, J. R. and G. L. Urban. 1986. The value priority hypotheses for consumer budget plans.
Journal of Consumer Research, 12 (4), 446–462.
Hauser, J. R., G. L. Urban, G. Liberali, and M. Braun. 2009. Website morphing. Marketing Sci-
ence, 28 (2), 202–223.
Hauser, J. R., G. L. Urban, and B. Weinberg. 1993. How consumers allocate their time when
searching for information. Journal of Marketing Research, 30 (4), 452-466.
Hutchinson, J. M. C. and G. Gigerenzer. 2005. Simple heuristics and rules of thumb: where psy-
chologists and behavioural biologists might meet. Behavioural Processes, 69, 97-124.
Jacoby, L. L., and M. Dallas. 1981. On the Relationship Between Autobiographical Memory and
Perceptual Learning. Journal of Experimental Psychology: General, 110, 306–340.
Johnson, E. J. and J. W. Payne. 1985. Effort and accuracy in choice. Management Science, 31,
395-414.
Learning from Experience, Simply
35
Kim, J. B., P. Albuquerque, and B. J. Bronnenberg. 2010. Online demand under limited consum-
er search. Marketing Science, 29 (6), 1001–1023.
Kohli, R., and K. Jedidi. 2007. Representation and inference of lexicographic preference models
and their variants. Marketing Science, 26 (3), 380-399.
Marewski, J. N., R. F. Pohl, and O. Vitouch. 2010. Recognition-based judgments and decisions:
introduction to the special issue (Vol. 1). Judgment and Decision Making, 5 (4), 207–215
Martignon, L. and U. Hoffrage. 2002. Fast, frugal, and fit: simple heuristics for paired compari-
sons. Theory and Decision, 52, 29-71.
Mehta, N., X. J. Chen, and O. Narasimhan. 2008. Informing, transforming, and persuading: dis-
entangling the multiple effects of advertising on brand choice decisions. Marketing Sci-
ence, 27 (3), 334 – 355.
Narasimhan, C. 1988. Competitive promotional strategies. Journal of Business, 61 (4), 427 –
449.
Narayanan, S. and P. Manchanda. 2009. Heterogeneous learning and the targeting of marketing
communication for new products. Marketing Science, 28 (3), 424 – 441.
Narayanan, S., P. Manchanda, and P. K. Chintagunta. 2005. Temporal differences in the role of
marketing communication in new product categories. Journal of Marketing Research, 42
(3), 278–290.
Papadimitriou, C. H. and J. N. Tsitsiklis. 1999. The complexity of optimal queuing network con-
trol. Mathematics of Operations Research, 24 (2), 293–305.
Payne J.W., J. R. Bettman and E. J. Johnson. 1988. Adaptive strategy selection in decision mak-
ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 534-
552.
Learning from Experience, Simply
36
Payne J.W., J. R. Bettman and E. J. Johnson. 1993. The Adaptive Decision Maker. Cambridge
University Press: Cambridge UK.
Roberts, J. H. and G. L. Urban. 1988. Modeling multiattribute utility, risk, and belief dynamics
for new consumer durable brand choice. Management Science, 34 (2), 167–185.
Rust, J. 1994. Structural estimation of Markov decision processes. Handbook of Econometrics,
Chapter 51. Elsevier: Maryland Heights, MO.
Rust, J. 1996. Numerical dynamic programming in economics, Handbook of Computational
Economics, Chapter 14. Elsevier: Maryland Heights, MO.
Shugan, S. M. 1980. The cost of thinking. Journal of Consumer Research, 7 (2), 99–111.
Simon, H. A. 1955. A behavioral model of rational choice. Quarterly Journal of Economics, 69
(1), 99–118.
Simon, H. A. 1956. Rational choice and the structure of the environment. Psychological Review,
63 (2), 129–38.
Tversky, A. and D. Kahneman. 1973. Availability: a heuristic for judging frequency and proba-
bility. Cognitive Psychology, 5, 207–232.
Whittle, P. 1988. Restless bandits: activity allocation in a changing world. .Journal of Applied
Probability, 25, 287–298.
Yee, M., E. Dahan, J. R. Hauser and J. Orlin. 2007. Greedoid-based noncompensatory inference.
Marketing Science, 26 (4), (July-August), 532-549.
Learning from Experience, Simply
37
Table 1. Comparing Decision Strategies on Expected Utility and Thinking Costs
Average Consumer Utility [subject to affine transformation]
(standard errors in parentheses)
No Learning Myopic Learning Index Strategy Approximately Optimal
Number of grid points n/a n/a 200 x 50 (200 x 50)2
Computation time (surrogate for thinking costs) negligible negligible 102 seconds 6 x 104 seconds
Relatively low uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2)
( , ) = (0.0, – 0.3) 0.041 1.801 1.992 1.996
(0.003) (0.043) (0.045) (0.045)
( , ) = (0.0, 0.0) 0.618 3.352 3.544 3.547
(0.003) (0.049) (0.052) (0.052)
( , ) = (0.0, + 0.3) 3.036 5.298 5.323 5.327
(0.003) (0.056) (0.056) (0.056)
Relatively high uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2)
( , ) = (0.0, – 0.3) 4.919 5.762 5.767 5.768
(0.026) (0.047) (0.047) (0.047)
( , ) = (0.0, 0.0) 6.182 7.150 7.190 7.190
(0.027) (0.050) (0.052) (0.052)
( , ) = (0.0, + 0.3) 7.946 8.912 8.911 8.912
(0.026) (0.054) (0.054) (0.054)
Learning from Experience, Simply
38
Table 2. Transition Percentage among Diaper Brands
Percent of Households that Purchase Column Brand in Period 1 if Purchased Row Brand in Period
Pampers Huggies Luvs Other Brands
Within the first eight purchases
Pampers 66.7% 19.7% 10.9% 2.7%
Huggies 23.1% 64.3% 11.3% 1.2%
Luvs 18.1% 19.7% 58.8% 3.4%
Other Brands 21.2% 21.2% 20.0% 37.6%
After the first eight purchases
Pampers 76.5% 12.6% 9.7% 1.1%
Huggies 18.7% 75.9% 4.9% 0.5%
Luvs 20.7% 9.5% 67.5% 2.3%
Other Brands 14.1% 9.0% 23.1% 53.8%
Table 3. In-Sample and Out-of-Sample Fit Statistics for Diaper Data Estimation
No
Learning
Myopic
Learning
Index
Strategy
Approximately
Optimal
In-sample estimation statistics
Log likelihood - 777.17 - 480.81 - 469.81 - 472.16
U2 (percent information) 78.50% 86.68% 86.99% 86.92%
AIC 1562.34 985.63 971.63 976.33
BIC 1580.25 1039.37 1043.29 1047.99
Number of parameters 4 12 16 16
Out-of-sample validation statistics
Log likelihood - 1058.15 - 745.55 - 738.79 - 741.63
U2 (percent information) 78.77% 85.04% 85.18% 85.15%
Computational Time (sec) 0.15 0.29 40.5 857.8
Learning from Experience, Simply
39
Table 4. Maximum Likelihood Estimates for Prior Beliefs, Mean Quality, and the Relative Magnitude of the Variation in Utility Shocks
No
Learning
Myopic
Learning
Index
Strategy
Approximately
Optimal
Relative mean of prior beliefs,
Pampers 0.00 0.00 0.00 0.00
– – – –
Huggies -0.29 -0.08 -0.87 -0.48
(0.10) (0.18) (0.21) (0.12)
Luvs -0.66 -0.90 -1.40 -1.19
(0.12) (0.22) (0.19) (0.81)
Other Brands -2.80 -2.49 -2.03 -2.97
(0.25) (0.36) (0.57) (1.64)
Uncertainty of prior beliefs, , relative to ongoing quality uncertainty, †
Pampers – 0.56 0.56 0.62
– (0.12) (0.07) (0.16)
Huggies – 0.44 0.43 0.47
– (0.09) (0.01) (0.05)
Luvs – 1.50 1.54 2.01
– (0.51) (0.07) (1.59)
Other Brands ‡ – 11.68 17.67 13.88
– (41.08) (109.44) (15.27)
Mean quality (long-term),
Pampers – 4.84 3.13 2.69
– (0.74) (0.71) (0.56)
Huggies – 6.07 6.36 5.60
– (1.09) (0.58) (0.11)
Luvs – 3.21 2.26 1.95
– (0.45) (0.50) (0.67)
Other Brands – 0.71 0.54 0.45
– (0.55) (0.37) (0.32)
Uncertainty in utility shocks, , relative to ongoing quality uncertainty, †
Pampers – – 0.70 0.65
– – (0.25) (0.27)
Huggies – – 0.11 0.11
– – (0.04) (0.04)
Luvs – – 0.57 0.56
– – (0.12) (0.23)
Other Brands – – 2.65 2.02
– – (0.53) (0.86)
† Standard errors relative to . ‡ The likelihood is particularly flat in this parameter estimate.
Learning from Experience, Simply
40
Figure 1. Index Strategies Balance Utility and Cognitive Simplicity (Conceptual Diagram)
ApproximatelyOptimal
Myopic Learning
No Learning
Cognitive Simplicity
Optimal Utility
Index Strategy
Utility
Learning from Experience, Simply
41
Figure 2: Gittins’ Index (Evolution and Variation)
(a) Realized Indices for Two Brands (Consumer chooses brand with largest index value.)
(b) Index Values (Normalized) Vary with the Consumer’s Experience
‐0.8
‐0.6
‐0.4
‐0.2
0.0
0.2
0.4
0.6
0.8
1.0
0 5 10 15 20 25 30 35 40 45 50
Realized Gittins' In
dex
Period
Brand 1
Brand 2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 5 10 15 20 25 30 35 40 45 50
Gittins' In
dex
Period
Mean of Posterior Belief = 0
Mean of Posterior Belief = 0.2
Learning from Experience, Simply
42
Figure 3. Whittle’s Index as Utility Shock Magnitude and Experience Vary
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50
Whittle's Index
Period
Shock Magnitude = 0.01
Shock Magnitude = 0.1
Shock Magnitude = 1.0
Shock Magnitude = 5.0
Learning from Experience, Simply
43
Online Appendix A. Proof of Indexability
Without loss of generality, we set observable shock to zero and use to represent all
utility shocks. The focus is on the sub-problem where the consumer chooses a uncertain brand
against a certain reward . To simplify notation, we drop the brand identifier . The Bellman
equation for this problem is:
(A1) , ,
max , , | , | , , | , ,
where summarizes the consumer’s belief about the brand at time t. The definition of
indexability is that, for any state , , if it is optimal to choose the fixed reward , then it must
be also optimal to choose the fixed reward for any . This is equivalent to the following:
(A2) , , | | , , | , 0
⇒ , , | , , , | 1.
Intuitively, this is saying the expected future value of choosing the uncertain brand
should not grow too much compared to that of choosing the fixed reward , as increases. It
turns out the assumptions of the canonical problem of consumer learning are sufficient, though
not necessary.
We first define the expected value function , by integrating out :
(A3) , , , | .
The sub-problem can then be reduced to the fixed point:
(A4) , max , , | , | .
To see this, using definition in Equation (A3):
Learning from Experience, Simply
44
,
, , | , | , , | ,
max , , |
, , | | ,
max , , | , |
max , , | , |
where in the second equality we have used the assumption that and are i.i.d. so that
, , | , , = , , . In the third equality, we have
used the assumptions that and are independent of and .
Denote 0 as the option of the certain reward , and 1 as the uncertain brand. We define
the following quantities:
(A5) , , and , | , | .
First observe that the conditional probability of choosing 1 is given by:
(A6) 1| , , ,
,
max , , ,,
,.
The last equality is by interchanging the integration and differentiation and uses the defi-
nition of function. Similarly we have
Learning from Experience, Simply
45
(A7) 0| ,,
,.
Differentiate both sides of Equation (4) with respect to and use the Chain Rule:
(A8) , , ,
,
, ,
,
,
0| , 1 , 1| , , | ,
where the last equality uses Equations (A5), (A6), and (A7). The following lemma is useful to
establish indexability.
Lemma 1. For all , , we have
(A9) 0 ,1
1.
Proof. Fix any , , and . Suppose π* is the optimal policy that solves , , . First, if a posi-
tive constant c is added only to the fixed reward in every period but the uncertain brand re-
mains unchanged, then following π* yields an expected total utility at least at large at , , .
Therefore, , , , , . Second, if a positive constant c is added to both the fixed
reward and the uncertain brand in every period, then π* is optimal and yields expected total
utility of , , c/ 1 . By construction, adding a positive constant to both options
yields expected utility at least as high as adding the constant only to the fixed reward:
, , c/ 1 , , . Integrating out we have:
(A10) , , ,c
1.
It follows that:
(A11) 0, , 1
1.
Taking the limit on both sides as c→0 establishes the lemma.
Learning from Experience, Simply
46
Lemma 1 implies that 0 , 1 , . This result, together with Equa-
tion (A8), implies that 1 , , | . It then follows that:
(A12) , ,1 , , | 0,
which establishes the indexability condition in Equation (A2).
Online Appendix B. Proof of Proposition 2
We first prove two useful lemmas. The focus is again on the sub-problem of a single
brand and thus we drop the brand identifier j.
Lemma 2. Fix a prior , , and a quality sample : 0 . Consider a modified ver-
sion of the original sub-problem where the utility shocks become for all t, and the
fixed reward becomes . Denote and as the expected value and index value
for the modified problem. Then for any belief state , we have:
(A13) , ,1
,
(A14) , ; , , ; , .
Proof. We first prove that satisfies the fixed points defined by Equation (A4) for the modi-
fied problem. Suppose Equation (A13) holds, then
(A15) , ,
,1
,1
,
(A16) , | , |
| , |
1,
1,
Learning from Experience, Simply
47
where the definitions of and follow Equation (A5) for the modified problem. Define
∆ , , , . Then ∆ , ∆ , for the modified problem.
The assumption that the distribution of has scale and location parameters implies
(A17) ∆ Pr ∆ Pr ∆ ∆ , and
∆ ∆
.
The right hand side of Equation (A4) for the modified problem becomes
(A18) ∆
, 1 ∆ , ∆
∆
,1
1 ∆ ,1
∆
∆
, 1 ∆ , ∆ 1
,1
= , ,
which is the left hand side of Equation (A4). The second equality follows from Equations (A15)
to (17). The third equality uses the fact that , is the fixed point of Equation (A4). There-
fore, , also satisfies the fixed point of Equation (A4) for the modified problem.
For the second part of the lemma, we use the definition of Whittle index which is the
breakeven value of such that the two terms inside the curly brackets of Equation (1) are equal:
(A19) , , .
It suffices to show the proposed relation in Equation (A14) solves the above equality.
Note the right hand side of Equation (A19) is
Learning from Experience, Simply
48
, ,1
,1
,1
, ,
which is the left hand side. The first and last equalities follow from Equations (A15) and (A16).
The third equality follows from the definition of Whittle index for the original problem
ting , ). Then by the definition of Whittle index, , ; ,
, ; , .
Lemma 3. Fix the original sub-problem. Consider a modified problem where the quality sample
becomes : 0 , the utility shocks becomes for all t, the prior belief
becomes , , , and the fixed reward becomes . Then
for all , , , . Denote and as the expected value and in-
dex value for the modified problem. Then for any belief state , we have:
(A20) , ,1
,
(A21) , ; , , ; , .
Proof. The proof is similar to that of Lemma 2. Note that the Bayesian updating implies that for
all t the precision of the modified problem remains the same as that of the original problem:
(A22)
.
It follows that the updated posterior mean and variance have the following relationships:
(A23) 1 1 ,
(A24) 1 1 .
Learning from Experience, Simply
49
Therefore, the belief state in the next period preserves the relationship:
, , .
We then check whether satisfies the fixed points defined by Equation (A4) for the
modified problem. Suppose Equation (A20) holds, then
(A26) , ,
,1
,1
,
(A27) , | , |
,1
| , |
,1
| , |
, | , |1
,1
.
The second equality uses the fact that , , / 1 .
The third equality follows from normality and conjugate prior assumptions for distribution of
quality and belief . Then ∆ , , ,
∆ , for the modified problem. The assumption that the distribution of has scale and loca-
tion parameters implies
(A28) ∆ ∆ ,∆ ∆
.
The right hand side of Equation (A4) for the modified problem becomes
Learning from Experience, Simply
50
(A29) ∆
, 1 ∆ , ∆
∆
,1
1 ∆ ,1
∆
∆
, 1 ∆ , ∆ 1
,1
= , ,
which is the left hand side of Equation (A4). The second equality follows from Equations (A26)
to (A28). The third equality uses the fact that , is the fixed point of Equation (A4).
Therefore, , also satisfies the fixed point of Equation (4) for the modified problem.
For the second part of the lemma, we again use the definition of Whittle index in Equa-
tion (A19). It suffices to show the proposed relation in Equation (A21) solves the above equality.
Note the right hand side of Equation (A19) is
, ,1
,1
, , ,
which is the left hand side. The first and last equalities follow from Equations (26) and (27). The
third equality follows from the definition of Whittle index for the original problem
ting , ). Then by the definition of Whittle index, , ; ,
, ; , .
To complete the proof of the proposition, note that by Lemma 3 we have
, , ; , , , , ; , , .
Learning from Experience, Simply
51
Setting 1/ and / yields
, , ; , , 0, , ; 1, ,
0, , ; 1, 0,
0, , ; 1, 0, ,
where the second equality uses Lemma 2.
Online Appendix C. Proof of Proposition 3
The focus is again on the sub-problem of a single brand and thus drop the brand identifier j.
Proof of Proposition 3(1). The first part that the Whittle index is increasing in posterior mean
is evident from Proposition 2. For the second part, fix some belief state and consider any
. Let and be the corresponding Whittle index. Recall that ∆ , ,
, . Then by the definition of index, ∆ , ∆ , . Note that
(A30) ∆ ,∆ , , ,
0 ,
where the inequality is implied by (A12). It then follows that .
Proof of Proposition 3(2). The will prove the first part, and omit the second part which uses a
similar argument. Fix some . Let , , … be a sequence of random variables condi-
tionally i.i.d. from distribution , . Let , , … be a sequence of random variables
conditionally i.i.d. from distribution , . The two sequences are independent. Con-
struct a sequence of random variables such that for all . Then , , … are con-
ditionally i.i.d. from distribution , . Fix some policy applied to solve the problem un-
Learning from Experience, Simply
52
der , , … . Denote , 0 if the fixed reward is chosen, and , 1 other-
wise. Then the value function under state , when is applied becomes
, , ;
, 1 , 0 , ,
, 1 , 0 , ,
, 1 , 0 , ,
, 1 , ,
, , ; ,
where the last equality uses the fact that the second term is equal to zero. Note that
, , ; , , ; because the latter is the optimal value function. Therefore
, , ; , , ; for all and taking maximum on the left hand side gives
, , ; , , ; . Integrating out then yields , ; , ; . Then
we have , ; , ; ⁄ 0 for all , , . Differentiating both sides of Equa-
tion (A4) with respect to and using chain rule give:
, ; , ;
, ;
, ; , ;
, ;
, ;
0| , ; , ; , 1| , ; , , ; , | .
The last equality implies that
Learning from Experience, Simply
53
, ; 1| , ; ,
1 0| , ;, ; | .
It then follows that
∆ , ;
, ; , ; , |
1| , ;
1 0| , ;1 , ; |
1
1 0| , ;, ; | 0,
where the last inequality uses , ; | 0. Let and be the Whittle index
corresponding to and . This inequality implies ∆ , ; ∆ , ; . Since by defini-
tion ∆ , ; ∆ , ; , we have ∆ , ; ∆ , ; . It then follows that
by Equation (A30).
Proof of Proposition 3(3). Consider any . By the invariance property we have
, , ; , , , , ; , ,
, , ; , ,
, , ; , , ,
where the first equality follows from Proposition 3(2).
Online Appendix D. Computation of Index Function
We can use the invariance property to simplify the computation of Whittle index. The
computation is based on the fixed point problem in Equation (A4) and the definition of Whittle
Learning from Experience, Simply
54
index. Product identifier j is dropped to simplify notation. Note that the function evaluated at
0 is
0, , max 0, , , , , |0,
max 0, , , 1 , , |0,
max 0, , , 0, ,1
|0,
max 0, , , 0, , |0, ,
where ⁄ is the precision. The first equality is implied by Bayesian updating
formula for normal distribution. The second equality uses the fact that the expectation of and
conditional on 0 are both zero, and Equation (21) from Lemma 3. The last equality again
uses zero expectation of conditional on 0.
We now treat as a state variable. Let . Note that the distribution of
conditional on belief , is normal with mean and standard deviation . There-
fore | ~ , . The fixed point problem now only involves function
fixed at 0, and evolves on the state space , :
0, , max 0, , , 0, , |0, , .
Standard dynamic programming algorithms can be used to solve the above fixed point.
Given the solution of 0, , , we can compute the Whittle index for various values of ran-
dom shocks under 0: 0, , ; , , . The index evaluated at any value of posterior
mean is then computed by linear summation as implied by the invariance property.
Learning from Experience, Simply
55
Online Appendix E. Simulation of Value Function
Given a decision rule Π, we compute the value function (expected total utilities) by for-
ward simulating utilities for a sufficiently long horizon. Starting at a given state , , ,
we sample a large number D of Markov chains for each brand , : 1, 2, … ,
where is greater than the truncated time horizon . Bayesian updating of normal distribution
leads to the following state transition:
, | , ~ , , , , , , | , , 1 , ,
, ~ ; , , where , ,
, .
These sequences of belief states are then fixed in advance and reused for each decision
rule. Under a decision rule Π, the empirical estimate of its expected total utility for a truncated
time horizon is given by:
,1 Π , , , , , ,
where is the cumulative number of trials for brand up to time . Note that the realized state
values are chosen from the pre-drawn sample paths, with indicating which state in the sample
path is chosen.