learning from experience, simply - mitweb.mit.edu/hauser/www/hauser pdfs/lin_zhang_hauser...learning...

Learning from Experience, Simply

by

Song Lin

Juanjuan Zhang

and

John R. Hauser

June 1, 2012 Song Lin is a PhD Candidate, MIT Sloan School of Management, Massachusetts Institute of

Technology, E62-580, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 225-1639,

[email protected].

Juanjuan Zhang is an Associate Professor of Marketing, MIT Sloan School of Management,

Massachusetts Institute of Technology, E62-537, 77 Massachusetts Avenue, Cambridge, MA

02139, (617) 452-2790, [email protected].

John R. Hauser is the Kirin Professor of Marketing, MIT Sloan School of Management, Massa-

chusetts Institute of Technology, E62-538, 77 Massachusetts Avenue, Cambridge, MA 02139,

(617) 253-2929, [email protected].


1

Learning From Experience, Simply

Abstract

There is substantial interest in the marketing literature in modeling and estimating con-

sumer-learning dynamics. However, (approximately) optimal solutions to forward-looking learn-

ing problems are computationally complex, limiting their empirical applicability and behavioral

plausibility. Drawing on theories of cognitive simplicity from marketing, psychology, and eco-

nomics, we propose a behaviorally intuitive (and tractable) solution – index strategies. We argue

that index strategies balance thinking costs and discounted utility. Index strategies also avoid ex-

ponential growth in computational complexity as the size of the decision problem increases, ena-

bling researchers to study learning models in more complex situations.

The existence of index strategies depends upon a structural property called indexability,

which is hard to establish in general. We prove the indexability of canonical consumer learning

models in which both brand quality and future utility shocks are uncertain. We establish invari-

ance properties which make index strategies feasible for consumers to intuit. Using synthetic da-

ta, we demonstrate that index strategies achieve nearly optimal utility at low computational costs.

Using IRI data for a product category where we expect forward-looking learning, we find that an

index-strategy model fits behavior well, provides plausible parameter estimates, predicts out-of-

sample as well as or better than alternative models, and requires substantially lower computa-

tional costs.

Keywords: dynamic consumer learning, structural models, cognitive simplicity, index strate-

gies, heuristics, multi-armed bandit problems, restless bandits, indexability.


2

1. Introduction and Motivation

Considerable effort in marketing is devoted to modeling and estimating the dynamics by

which consumers learn from their consumption experience (e.g., Roberts and Urban 1988;

Erdem and Keane 1996; Erdem et al. 2005; Narayanan and Manchanda 2009; Ching and Ishiara

2010; Ching et al. 2011). Researchers have developed theory-rich models of optimizing forward-

looking consumers who balance exploitation (choosing the brand that yields the highest current

reward) with exploration (trying brands to gather information so that future consumption experi-

ences might improve). Pillars of these models are an explicitly specified description of consumer

utility and an explicitly specified process by which consumers learn. Most models assume con-

sumers choose brands by solving a dynamic program which maximizes expected total utility tak-

ing learning into account. Researchers argue that theory-based models are more likely to uncover

insight and be invariant for new-domain policy simulations (Chintagunta et al. 2006, p. 604).

However, these advantages often come at the expense of difficult and time-consuming solution

methods.

The dynamic programs for forward-looking learning models are, themselves, extremely

difficult to solve optimally. We cite evidence below that the problems are PSPACE-hard – they

cannot be solved using polynomial space (i.e., computation memory). PSPACE-hard implies the

more-familiar notion of NP-hard. This intractability presents both practical and theoretical chal-

lenges. Practically, problem difficulty requires researchers to rely on approximate solutions.

Without explicit comparisons to the optimal solution, we do not know the impact of the approx-

imations on estimation results. Moreover, the well-known “curse of dimensionality” prevents re-

searchers from investigating problems with moderate or large numbers of brands or marketing

variables, whereby even approximate solutions are not feasible. Theoretically, it is reasonable to


3

posit that a consumer cannot solve in his or her head a dynamic problem optimally that the best

computers cannot solve. In contrast, well-developed theories in marketing, psychology and eco-

nomics suggest that observed consumer decision rules are cognitively simple (e.g., Payne et al.

1988, 1993; Gigerenzer and Goldstein 1996).

We propose and evaluate an alternative theory of forward-looking consumers’ solutions

to the dynamic learning problems – index strategies. We retain the basic pillars of structural

modeling: an explicit description of consumer utility, an explicit Bayesian learning process, and

an assumption that consumers seek to optimize expected discounted utility. We posit in addition

a cost to thinking (e.g., Shugan 1980; Johnson and Payne 1985). We assume the consumer

chooses a strategy that is likely to optimize expected total utility minus thinking costs. While

thinking costs might be observable in the laboratory, say through response latency, they are in-

herently unobservable in vivo. To posit index strategies we identify domains where index strate-

gies are nearly optimal in the reduced problem of maximizing expected discounted utility. If, in

such domains, index strategies are substantially simpler for the consumer to implement, then it is

likely that savings in thinking costs exceed the slight reduction in optimality in the reduced prob-

lem. In the special cases where index strategies are optimal in the reduced problem, we argue

they are superior as a description of forward-looking consumers. Following the same logic, we

also establish conditions where myopic learning strategies suffice.

To prove the viability of index strategies as a descriptive model of consumers we must

(1) establish when well-defined index strategies exist, (2) provide intuition that they are cogni-

tively simple and behaviorally intuitive (and hence might be used by consumers), (3) investigate

when index strategies are optimal or near optimal solutions to the reduced problem of utility

maximization even if there were no thinking costs, and (4) test whether index strategies explain


4

observed consumer behavior at least as well as alternative models. We address #1 analytically

by proving the “indexability” property of canonical forward-looking learning models.

(Indexability is hard to establish in general.) We address #2 by examining the form and proper-

ties of index strategies and arguing they are cognitively simple and behaviorally intuitive relative

to current solution methods. We address #3 with both analytical arguments and synthetic data.

We address #4 by estimating alternative models using IRI data on the purchase of diapers, a

product category where we expect to see forward-looking learning. We begin with a review of

concepts upon which we build.

Our basic hypothesis is that consumers use a cognitively-simple index strategy. We

demonstrate viability by showing that the “Whittle index” (to be defined later) is one candidate

that satisfies the four criteria. Figure 1 is a conceptual summary of our hypothesis. The concept

of an index solution could apply to other indices (or approximations to the Whittle index) if such

indices are shown to be better descriptions of observed consumer behavior. Future papers might

evaluate such indices on the four criteria and compare their performance to the Whittle index.

[Insert Figure 1 about here.]

2. Related Literatures

We draw on concepts from four literatures: learning dynamics, cognitive simplicity, de-

scriptive models based on optimal solutions to reduced problems, and bandit problems.

2.1. Learning Dynamics

Many influential papers study consumer learning dynamics and apply learning models to

explain or forecast consumer choices. Using data from automotive consumers Roberts and Urban

(1988) estimate a model in which consumers use Bayesian learning to integrate information from

a variety of sources to resolve uncertainty about “brand quality.” They, like subsequent authors,


5

define brand quality generally either by a multi-attributed utility function or by a match of a

brand’s features to the consumer’s needs. Erdem and Keane (1996) build upon the concept of

Bayesian learning to include forward-looking consumers who tradeoff exploitation with explora-

tion. For frequently-purchased goods, their model fits data better than a purely myopic model

(reduced form of Guadagni and Little 1983) and as well as the Roberts-Urban myopic-learning

model. These papers stimulated a line of research that estimates the dynamics of consumer learn-

ing – for a recent review see Ching et al. (2011). Some models retain myopic consumers with

Bayesian learning (e.g., Narayanan et al. 2005; Mehta, et al. 2008; Chintagunta et al. 2009; Na-

rayanan and Manchanda 2009; Ching and Ishihara 2010), while others explicitly model forward-

looking consumers (e.g., Ackerberg 2003; Crawford and Shum 2005; Erdem et al. 2005, 2008;

Kim et al. 2010).1 Because forward-looking learning problems are computationally intractable,

many applications estimate models based on myopic learning (Narayanan and Manchanda 2009;

Ching and Ishihara 2010).

In general, forward-looking models fit empirical data well but have not improved predic-

tion much relative to myopic learning models. However, when the theory is accurately descrip-

tive, the more-complex models should improve policy simulations. Because the forward-looking

assumption requires consumers to solve computationally hard dynamic problems, some authors

have suggested that “the future development of structural models in marketing will focus on the

interface between economics and psychology (Chintagunta et al. 2006, p. 614).”

2.2. Cognitive Simplicity

Parallel literatures in marketing, psychology, and economics provide evidence that con-

sumers use decision rules that are cognitively simple. In marketing Bettman et al. (1998) and

1 Kim et al. (2010) model consumers’ sequential search of products. The search problem can be seen as a forward-looking learning problem in which a consumer’s product “experience” completely reveals product value.


6

Payne et al. (1988, 1993) present evidence that consumers use simple heuristic decision rules to

evaluate products. For example, under time pressure, consumers often use conjunctive rules (re-

quire a few “must have” features) rather than linear-utility-based rules. Using simulated thinking

costs with “elementary information processes,” Johnson and Payne (1985) illustrate how heuris-

tic decision rules can be rational when balancing utility and thinking costs. Methods to estimate

the parameters of cognitively-simple decision rules vary, but, in general, such rules predict as

well as or better than linear utility (e.g., Bröder 2000; Gilbride and Allenby 2004; Kohli and

Jedidi 2007; Yee et al. 2007; Hauser et al. 2010).

Building on Simon’s (1995, 1956) theory of bounded rationality, researchers in psychol-

ogy argue that human beings use cognitively simple rules that are “fast and frugal” (e.g.,

Gigerenzer and Goldstein 1996; Martignon and Hoffrage 2002). Fast and frugal rules evolve

when consumers learn decision rules from experience. Consumers continue to use the decision

rules because they lead to good outcomes in familiar environments (Goldstein and Gigerenzer

2002). For example, when judging the size of cities, “take the best” often leads to better judg-

ments than a linear rule.2 Indeed, in 2010-2011 two issues of Judgment and Decision Making

were devoted to the recognition heuristic alone (e.g., Marewski et al. 2010). Related concepts in-

clude accessibility (e.g., Bruner 1957), fluency (e.g., Jacoby and Dallas 1981), and availability

(e.g., Tversky and Kahneman 1973).

The costly nature of cognition has also received attention in economics (see Camerer

2003 for a review). A line of research looks to extend or revise standard dynamic decision-

making models with the explicit recognition that cognition is costly. For example, Gabaix and

Laibson (2000) empirically test a behavioral solution to decision-tree problems – decision-

2 The take-the-best rule is, simply, if you recognize one city and not the other it is likely larger; if you recognize both use the most diagnostic feature to make the choice.


7

makers actively eliminate low-probability branches to simplify the task. Gabaix et al. (2006) de-

velop a “directed cognition model,” in which a decision-maker acts as if he/she has only one

more opportunity to search. In the laboratory, the directed cognition model explains subjects’ so-

lutions to a simple dynamic problem better than a standard search model that assumes costless

cognition.

Cognitive process mechanisms are still being debated in the marketing, psychology, and

economics literatures. Our hypothesis that consumers use index or index-like strategies needs on-

ly the observation that consumers favor decision rules that are cognitively simple and that such

rules often lead to very good outcomes. The cognitive-simplicity hypothesis assumes that con-

sumers tradeoff utility gains versus thinking costs, but does not require explicit measurement of

thinking costs.

2.3. Descriptive Models Based on Optimal Solutions to Reduced Problems

If a ball player wants to catch a ball that is already high in the air and traveling directly

toward the player, then all the player need do is gaze upon the ball, start running, and adjust

his/her speed to maintain a constant gaze angle with the ball (Hutchison and Gigerenzer 2005, p.

102).3 The gaze heuristic is an example where a cognitively simple rule accomplishes a task that

might otherwise involve solving difficult differential equations. But the principle is more gen-

eral: complex optimization problems often have simple solutions.

Suppose a consumer knows the utilities and prices of a set of durable goods and wishes to

choose the maximum utility set subject to a budget constraint. If the brands were infinitely di-

visible, then simple “greedy” solutions lead to the optimal allocation: choose brands in the order

of either utility/price or utility – ×price (where is the shadow price of the budget constraint).

3 Professional athletes use more-complicated heuristics that give them greater range, for example, in baseball, pre-positioning based on prior tendencies and the expected pitch, and the sound as the bat hits the ball.


8

Do so until the budget is exhausted. When brands are not infinitely divisible neither heuristic is

optimal, but both greedy heuristics provide excellent (and empirically indistinguishable) descrip-

tions of consumer behavior (Hauser and Urban 1986). Greedy heuristics are one justification for

a utility specification that is linear in price. Another heuristic, choosing the information source

with the maximum gain in value per unit time while looking only one-step ahead, explains well

how consumers search for automobiles (Hauser et al. 1993). There are many examples in psy-

chology and marketing where seemingly simple decision rules solve more-complex problems.

We argue in §4 and §5 that some index strategies are simple decision rules that solve complex

dynamic optimization problems.

2.4. Bandit Problems

The multi-armed bandit problem is a prototypical problem that illustrates the fundamen-

tal tradeoff between exploration and exploitation in sequential decision making under uncertain-

ty. In a bandit problem the consumer faces a finite number of choices, each of which has an un-

certain value. The consumer must make choices, observe outcomes, and update beliefs with a se-

quential decision rule to maximize expected discounted values. First formulated by the British in

World War II, for over thirty years no simple solution was known. Then Gittins and Jones (1974)

demonstrated a simple index solution – develop an index for each “arm” (each choice alterna-

tive) by solving a sub-problem that involves only that arm, then choose the arm with the largest

index. Gittins and Jones (1974) proved the surprising result that the index solution is the optimal

solution to the classic bandit problem whenever the non-chosen choice alternatives do not

change over time4.

When non-chosen choice alternatives change over time, say due to random shocks, 4 The Gittins’ index has been successfully applied in a variety of fields. For example, Hauser et al. (2009) apply Gittins’ index to derive optimal “website morphing” strategies that match website design with customers’ cognitive styles. Recently morphing was applied to AT&T’s banner advertising on CNET.


9

Gittins’ index is no longer guaranteed to be optimal. Such problems are known as “restless ban-

dits” (Whittle 1988) and, in general, are computationally intractable (Papadimitriou and

Tsitsiklis 1999). In his seminal paper, Whittle (1988) proposed a tractable heuristic solution

grounded on the optimal solution to a relaxed problem. His solution generalizes Gittins’ index

such that the problem can be solved optimally or near optimally by associating an index (referred

to as Whittle’s index) separately with each alternative and choosing the alternative with the larg-

est index. This index solution reduces an exponentially complex, intractable problem to a set of

one-dimensional problems.

The existence of well-defined index solutions relies on a structural property called

indexability, which is not guaranteed for all restless bandit problems. We show that the canonical

forward-looking learning problem is indexable. We also show that the index of an alternative is a

simple function of key parameters pertaining to this brand, including means and variances of

brand quality, quality beliefs, and utility shocks. We then argue that it is reasonable for the con-

sumer to intuit how an index varies as a function of these parameters.

3. Canonical Forward-Looking Learning Problem

We consider the following canonical forward-looking learning problem. A consumer se-

quentially chooses from a set containing brands. Let index brands and index time. The

consumer’s utility, , from choosing at has three components. “Quality” (enjoyment, fit

with needs, weighted sum of brand features, etc.), , is drawn from a distribution ;

with parameters uncertain to the consumer. The distribution functions are independent over

. This independence assumption rules out learning about brand by choosing another brand.

Quality is realized after each consumption occasion. Conditionally on , quality draws

are independent over time; however, quality draws may be correlated over time through .The


10

interpretation is as follows. Because quality is a general concept that includes enjoyment and fit

with needs that might vary among consumption occasions, a single brand experience is not suffi-

cient to resolve all uncertainty. However, other things being equal, a good brand experience sug-

gests that future experiences with the same brand are likely to be good. This gives forward-

looking consumers the “exploration” incentive – by trying out a brand, consumers make better-

informed decisions about this brand in future.

The second component of utility is a set of “observable shocks”, , such as advertising,

price, promotion, and other control variables that are observable to the researcher and consumer.5

For simplicity, we assume the affect utility directly, although the model is extendable to indi-

rect effects as in Ackerberg (2003), Erdem and Keane (1996), and Narayanan et al. (2005). The

third component of utility is an “unobservable shock”, , which represents random preference

fluctuations observed by the consumer but not by the researcher.

We refer to the weighted sum of observable and unobservable shocks, ′ , as

“utility shocks,” where is a vector of weight parameters. Utility shocks are important because,

without them, the consumer would learn to choose a single brand (if the exogenous variables

stabilized to a known value, say constant price and advertising), an outcome that is often violated

in real-world observations. We let utility shocks be drawn from a joint distribution,

, ; , independently over time with parameters, .6 The are independent over . The

consumer knows the distribution and the value of , observes the current utility shocks prior

to his/her decision at , but does not know future realizations of the shocks. We make the con-

5 Our consumer learning model treats observable utility shocks as exogenous. However, the same insight applies to endogenous observable utility shocks as long as (1) each “atomic” consumer’s learning does not affect these shocks (e.g., a brand’s advertising expenditure), and (2) these shocks do not directly convey quality information. 6 Observable shocks can be independently distributed over time for a number of reasons. For example, firms may in-tentionally randomize price promotions in response to competition. Such “mixed strategies” can generate observed prices that appear to be freshly drawn in each period from a known distribution (Narasimhan 1988).


11

servative assumption that utility shocks are independent of and thus do not help consumers

learn quality directly. However, utility shocks do shape learning indirectly by varying consum-

ers’ utility from exploitation, which in turn affects their incentive for exploration.

In summary, we write the consumer’s utility from choosing at as follows:

(1) ′ .

The consumer uses Bayes Theorem to update his or her beliefs about the parameters, ,

after each consumption experience (assumed to occur after choice but before the next choice).

Let be a set of parameters that summarize the consumer’s beliefs about at time . At 0,

beliefs about are summarized by a prior distribution, ; where is based on all rel-

evant prior experience. After the consumption experience the consumer’s posterior beliefs

are summarized by ; . For example, when both and prior beliefs are normal, Bayesi-

an updating is naturally conjugate. We obtain , using standard updating formulae.

The parameters of posterior beliefs, ∈ Ω and the realized utility shocks, ∈ and ∈ ,

summarize the state of information about brand . The collection of brand-specific states,

, , , , … , , , , … , , , , … , represent the set of states relevant to

the decision problem at .

We seek to model a decision strategy, Π: Ω → , that maps the state space to

the choice set. Without further assumptions, the consumer must choose a decision strategy to

maximize expected discounted utility:

(2) , , max ′ , , ,

where is the discount factor and the expectation is taken over the stochastic process generat-


12

ed by the decision strategy (in particular, the transition between states that may depend on the

consumer’s brand choice), its implication for Bayesian updating, and the , , and distribu-

tion functions. The infinite horizon can be justified either by repeat consumption over a long

horizon or by the consumer’s subjective belief that he/she will terminate the decision problem

randomly.

The optimal solution to the consumer’s decision problem can be characterized as the so-

lution to Bellman’s equation.

(3) , , max∈

′ , ′ , | , .

While Bellman’s equation is conceptually simple, the full solution is computationally intractable

because, even after integrating out the utility shocks and , it evolves on a state space of size

|Ω| , where |Ω| is the number of elements in Ω. Not only is this exponential in the number of

brands, but problem dimensionality gets extremely large if the state space is large. Even when

the optimal solution is approximated by choosing discrete points to represent Ω, |Ω| is large.

4. An Index Strategy in the Absence of Utility Shocks

The learning problem we examine includes utility shocks, but it is easier to illustrate the

intuition of index strategies using a problem without utility shocks. Temporarily assume

0 for all and , although the same result holds when there is no inter-temporal vari-

ation in . In this special case the consumer’s decision problem is a classic multi-arm bandit.

Gittins’ insight is as follows. To evaluate a brand , the consumer thinks as if he/she is

choosing between this brand and a fixed reward , which, once chosen, is chosen for all future

periods. In each period , the consumer solves an independent sub-problem for each brand –

he/she either consumes this brand to gain more information about it, or exploits the fixed reward


13

. In the latter case, the consumer’s beliefs about the brand cease to evolve, such that ,

. The optimal solution to this sub-problem is determined by a greatly simplified version of

Bellman’s equation:

(4) , max , , , , | .

Notice that each sub-problem only depends on the state evolution of a single brand, which is

much simpler than the full problem specified in Equation 3.

Gittins’ index, , is defined as the smallest value of such that the consumer at

time is just indifferent between experiencing brand and receiving the fixed reward. We obtain

by equating the two terms in brackets in Equation 4. Gittins proposed that could be

used as a measuring device for the value of exploring brand – if there is more uncertainty about

a brand left to explore, the consumer will demand a higher fixed reward to be willing to stop ex-

ploration. Gittins’ index is updated when new information is realized.

Gittins’ surprising result is the Index Theorem. The optimal solution in each period is to

choose the brand with the highest index in that period. The consumer suffers no loss in expected

discounted utility by using an index strategy. A computationally difficult problem has thus been

decomposed into simpler sub-problems.

Index Theorem (Gittins and Jones 1974). The optimal decision strategy when there are

no utility shocks is Π ∈ .

Figure 2 illustrates the Index Theorem. In Figure 2a we computed Gittins’ indices for two

brands (for normal and normal priors). The true mean quality for Brand 1 is normalized to ze-

ro and the true mean quality of Brand 2 is negative. The consumers’ mean prior beliefs reflect


14

the true mean qualities, but the consumer is uncertain about these beliefs.

Prior to period 7 the index strategy causes the consumer to bounce between choosing

Brand 1 and Brand 2. In periods 7 to 12, he/she continues to sample Brand 2 until by period 13

he/she has learned enough about the brands. After period 13 the consumer switches to Brand 1

and remains with Brand 1 indefinitely. A brand’s index remains flat when it is not being con-

sumed because the simplified problem does not include utility shocks.


Index strategies are much simpler than a brute force solution to Bellman’s equation, but

can the consumer intuit (perhaps approximately) how an index varies with experience and be-

liefs? We expect future laboratory experiments to address this issue explicitly. In this paper, we

argue that index strategies have intuitive properties and that it is not unreasonable for the con-

sumer to intuit those properties.

Figure 2b illustrates the intuitive properties of Gittins’ index. We plot the expected value

of Gittins’ index as it would evolve if the brand were chosen repeatedly (assuming the mean of

the prior distribution was accurate). The two lines represent two brands that differ in mean quali-

ty but not uncertainty. In each given period, Gittins’ index is larger for the higher-quality brand.

Naturally, with the same amount of remaining uncertainty, a higher-quality brand offers a greater

value of exploitation. Both index curves decline smoothly and converge with experience toward

true brand quality. The index converges because the value of exploration decreases as the con-

sumer learns more about the distribution of brand quality. The amount by which the index ex-

ceeds the mean quality beliefs is the value of learning (the value of reducing uncertainty).

When we plot Gittins’ index as a function of the consumer’s posterior quality uncertainty

(not shown), it is also intuitive – the index increases with because the value of exploration


15

increases with the remaining amount of quality uncertainty. We therefore posit that it is not un-

reasonable for the consumer to intuit the (approximate) shape of Gittins’ index as a function of

the parameters of the problem.

5. The Index Strategy When Utility Shocks are Present

We now allow utility shocks. Observable shocks ( ) include effects that researchers

might observe and model, such as changes in advertising, promotion, or price. Unobservable

shocks ( ) include effects that researchers do not observe, such as changes in the characteristics

of the consumption occasion or changes in the idiosyncratic taste. Because shocks enter the utili-

ty function the consumer may, at any period, switch among brands. Without utility shocks the

consumer’s choice converges to a single brand as in Figure 2a.

When the model includes utility shocks, the Gittins-Jones Index Theorem no longer ap-

plies because non-chosen brands do not remain constant. With shocks, the consumer’s problem

belongs to the class of “restless-bandit problems” as introduced by Whittle (1988). In general

such optimization problems are PSPACE-hard (Papadimitriou and Tsitsiklis 1999) making the

problem extremely difficult (if not infeasible) to solve and making it implausible that the con-

sumer would be able to solve the problem with a solution strategy based on Equation 3.

Whittle (1988) proposed a solution that generalizes Gittins’ index. In each period, to

evaluate a brand , the consumer thinks as if he or she must choose between brand and a fixed

reward, . Bellman’s equation for the sub-problem (which now includes the utility shocks)

is:


16

(5) , , ,

max , , , , , , , , , , , , | .

The index is defined as the smallest value of such that the consumer at time is just indiffer-

ent between experiencing brand and receiving the fixed reward. For such an index to be well-

defined and meaningful, the “indexability” condition needs to be satisfied (Whittle 1988). Let

⊆ Ω be the set of states for which choosing at time is optimal:

(6) , , ∈ Ω : , , , , , ,

′ , , , , , , | .

Indexability is defined as:

Definition: A brand is indexable if, for any , ⊇ for any .

Indexability says that as the fixed reward increases, the collection of states for which the

fixed reward is optimal does not decrease. Intuitively, indexability requires that, if under some

state it is optimal to choose the fixed reward, then it must also be optimal to choose a higher

fixed reward. Indexability implies a consistent ordering of brands for any state, so an index strat-

egy is meaningful. Because this condition does not hold for all restless-bandit problems, we must

establish indexability when the model includes utility shocks. In a companion online appendix

we prove the following proposition.

Proposition 1. The canonical forward-looking learning problem is indexable.


17

Once the indexability condition is established, then a well-defined index strategy is to

choose in each period the brand with the largest index. The index strategy breaks “the curse of

dimensionality” by decomposing a problem with exponential complexity into much-simpler

sub-problems, each on a state space of |Ω| after integrating out the utility shocks and . By

breaking this curse, it is more plausible that the consumer might use the decision strategy. As a

bonus, estimation is far simpler when a dynamic learning problem is nested. It remains to be

shown that the index strategy with utility shocks is invariant to scale, intuitive, and implies a rea-

sonable utility vs. thinking cost tradeoff.

5.1. The Index Strategy is Invariant to Scale and Behaves Intuitively

An index strategy would be difficult for the consumer to use if the strategy were not in-

variant to scale. If it is invariant the consumer can intuit (or learn) the basic shape of the index

function and use that intuited shape in many situations. Invariance facilitates ecological rationali-

ty. The following results hold for fairly general distribution of quality, ; , and joint dis-

tribution of utility shocks and , as long as they have scale and location parameters and the

quality belief ; is conjugate. To ease interpretation, we assume that and are nor-

mal distributions with parameters as defined earlier: and for quality; and for posteri-

or beliefs about quality; and , and , for utility shocks. In a companion online appendix we

prove the following proposition.

Proposition 2. Let be Whittle’s index for the canonical learning problem computed when

the posterior mean quality ( ) is zero, the mean utility shock ( , ) is zero, and the inherent

variation of quality ( ) is 1. For forward-looking consumers, Whittle’s index is scalable in

these parameters. That is:


18

, , ′ , , , , , ,

, 0, ,,

, 1, 0,,

, .

Proposition 2 implies that the consumer can simplify his or her mental evaluations by de-

composing the index for each brand into (1) the mean utility gained from “myopic learning”,

, , which reflects the exploitation of posterior beliefs, and (2) the incremental benefit of

looking forward, , which captures quality information gained through exploration. The con-

sumer needs only intuit the shape of for a limited range of parameter values and scale it by .

To provide further intuition, we prove the following proposition in an online appendix.

The proposition shows that Whittle’s index behaves as expected when the parameters of the

problem vary. The consumer likes increases in quality and utility shocks, dislikes inherent uncer-

tainty in quality and utility shocks, but values uncertainty in beliefs about mean quality because

such uncertainty increases the value of learning about that alternative.

Proposition 3. Whittle’s Index for the canonical learning problem is (1) increasing in

the posterior mean of quality ( ), the observable utility shocks ( ′ ), and the unob-

servable (to the researcher) utility shock ( ), (2) weakly decreasing in the inherent un-

certainty in quality ( ) and the uncertainty in the utility shocks ( , ), and (3) weakly in-

creasing in the consumer’s posterior uncertainty about quality ( ).

Figure 3 illustrates Whittle’s index where we set the posterior mean of quality to zero.

(We observe similar shapes of Whittle’s index for other parameter values.) Like Gittins’ index,

Whittle’s index is a smooth decreasing function of experience (experience reduces posterior

quality uncertainty). With sufficient experience, Whittle’s index converges toward zero implying


19

that asymptotically the value of a brand is based on the posterior mean of quality (Proposition 2).

Unlike Gittins’ index, Whittle’s index is a function of the magnitude of utility shocks ( , ). As

the magnitude of utility shocks gets larger, it is less important for the consumer to learn about

quality and the value of the index deceases as shown in Figure 3. These properties (and the shape

of the curve itself) are intuitive.


Figure 3 and Proposition 3 suggest that, other things being equal, when the magnitude of

utility shocks is larger, then the realized utility shocks are more likely to be the deciding factor in

consumers’ brand choices. When , 5 in Figure 3, the Whittle-index curve is almost flat im-

plying an almost myopic strategy. To formalize this insight, we state the following Corollary to

Proposition 3:

Corollary. (1) When the consumer’s posterior quality uncertainty dominates the uncer-

tainty in utility shocks, the value to the consumer from looking forward is high. (2) When

the uncertainty in utility shocks dominates the consumer’s posterior quality uncertainty,

the value from looking forward is low. In this case, a myopic leaning strategy (i.e., ex-

ploiting posterior beliefs) suffices, and could be the optimal strategy if it requires lower

thinking costs than a forward-looking learning strategy.

6. Examination of the Near Optimality of an Index Strategy (Synthetic Data)

We now examine whether an index strategy implies a reasonable tradeoff between opti-

mality and thinking costs. Thinking costs remain unobservable, but §4 and §5 suggest that think-

ing costs could be substantially smaller with an index strategy compared to the direct solution of

the PSPACE-hard version of Bellman’s equation. It remains to show that the loss in utility is

small. To examine this issue we switch from analytic derivations to synthetic data because the


20

loss in utility is an issue of magnitude rather than direction. Synthetic data establish existence

(rather than universality) of situations where index strategies are close to optimal.

We set the to zero and examine near optimality for the special case when and

are normal distributions. We compare four decision strategies that the consumer might use.

1. No Learning. In this naïve strategy the consumer chooses the brand based only on his or

her prior beliefs of quality and the current utility shocks. This strategy provides a baseline

to evaluate the incremental value of learning.

2. Myopic Learning. In this strategy the consumer chooses the brand based only on his or

her posterior quality beliefs and the current utility shocks. This strategy exploits the con-

sumer’s posterior knowledge about brand quality. The Corollary predicts that this strate-

gy will suffice when the magnitude of the utility shock is relatively high.

3. Index Strategy. This strategy assumes the consumer can intuit the shape of Whittle’s in-

dex. As per Proposition 2, this strategy improves on the myopic-learning strategy to take

into account the option value of learning. Brand choices now reflect the consumer’s

tradeoff between exploitation and exploration. The Corollary predicts the index strategy

will outperform myopic learning especially when the magnitude of the utility shocks is

relatively low.

4. Approximate Optimality. The PSPACE-hard forward-looking learning problem cannot

be solved optimally, hence researchers use approximate solutions. Although approxima-

tion methods vary in the literature, discrete optimization is representative and should

converge to the optimal solution with finer grids (Rust 1996). We discretize the state

space, , into a set of grid points for each of brands.

We choose parameters that illustrate the phenomena and we expect they are empirically


21

reasonable. The approximate optimal solution requires a finite time horizon; we select 50

periods. If the discount factor is 0.90, truncation to a finite horizon should be negligible and,

in any case, is biased against index strategies. We choose 200 which should be very close to

optimal in the continuous problem. To simplify integration we draw the utility shocks from a

Gumbel distribution with parameters , and normalize the location parameter such that the

utility shocks have zero unconditional means. Setting 2 is sufficient to examine optimality

and makes the approximately optimal solution feasible – it evolves on a state-space of .

In this sense, the two-brand case provides a conservative test of the relative cognitive simplicity

of index strategies. The ongoing uncertainties in quality for both brands, , are equal and nor-

malized to 1.

We vary the parameter values to capture three possibilities: (1) the mean and uncertainty

both favor one brand, (2) the means are the same but uncertainty favors one brand, and (3) the

mean and uncertainty favor different brands. Because quality beliefs are relative and because we

can interpret Brand 1 and Brand 2 interchangeably, we need vary only the prior means in quality

beliefs for one brand. Therefore we fix the prior mean quality belief of Brand 1 to zero and vary

the prior mean quality beliefs for Brand 2 relative to Brand 1. We normalize the standard devia-

tion of Brand 2’s prior quality belief to one and let the standard deviation of Brand 1’s prior

quality belief be 1/2. While other relative variations are possible, these variations suffice to il-

lustrate the phenomena. Finally, to test the predictions in §5 we allow the uncertainty in shocks

to vary from relatively small to relatively large.7

We compute the consumer’s expected total utilities for 50 periods under different deci-

7 Specifically, we normalize 0 and 1 2 . We vary the means of the brand with greater prior variance with ∈ 0.3, 0.0, 0.3 . We vary the relative uncertainty in utility shocks with ∈0.1, 1.0 .


22

sion strategies. Details are provided in a companion online appendix. The results are summarized

in Table 1.

[Insert Table 1 about here.]

We first examine cumulative computation times which are surrogates for thinking costs.

As expected, the no-learning and myopic-learning strategies impose negligible thinking costs,

the index strategy has moderate thinking costs, and the approximate optimal solution requires

substantial thinking costs – 600 times the thinking costs under an index strategy even for this

simple problem (the number will be even greater with finer grid points).

We first examine the consumer’s expected utilities when there is relatively low magni-

tude of utility shocks (upper panel of Table 1). In all cases examined, the no-learning strategy

leads to the lowest utility, which suggests that learning is valuable. As predicted by the Corol-

lary, the index strategy and the approximately optimal strategy generate significantly higher utili-

ty than myopic learning for this problem. Furthermore, the index strategy is indistinguishable

from the approximately optimal strategy. As long as thinking costs matter even a little, an index

strategy will be better on utility minus thinking costs.

We next examine the case of relatively high magnitude of utility shocks (lower panel of

Table 1). As predicted by the Corollary, the myopic-learning model performs virtually the same

as either the index strategy or the approximately optimal strategy. The differences are not signif-

icant. In this case, the consumer might achieve the best utility minus thinking costs with a myop-

ic strategy (among the models tested).

Analysis of synthetic data never covers all cases. Table 1 is best interpreted as providing

evidence that (1) there exist reasonable situations where an index solution is better than the ap-

proximately optimal solution on utility minus thinking costs and (2) there exist domains where


23

myopic learning is best on utility minus thinking costs. We now examine empirical data.

7. Field Estimation of an Index Strategy (IRI Data on Diaper Purchases)

We seek to identify at least one situation where consumers are likely to be forward-

looking and, in that situation, examine whether an index solution fits (predicts) as well or better

than an approximately optimal solution. Even if an index solution does no better than an approx-

imately optimal solution, we consider the result promising because an index solution is cogni-

tively simpler. As a test of face validity, we also expect learning strategies to outperform no-

learning strategies and forward-looking strategies to outperform myopic-learning strategies when

the situation favors forward-looking behavior.

7.1. IRI Data on Diaper Purchases

We select the diaper category from the IRI Marketing Dataset that is maintained by the

SymphonyIRI Group and available to academic researchers (Bronnenberg, et al. 2008). Diaper

consumers are likely to be learning and forward-looking. Parents typically begin purchasing dia-

pers based on a discrete birth event. Even if the birth is a second or subsequent child, “quality”

may have changed. Informal qualitative interviews suggest that parents learn about whether dia-

per brands match their needs through experience (often more than one purchase), that diapers are

sufficiently important that parents take learning seriously, and that parents often try multiple

brands before settling on a favorite brand. There are observable shocks due to price promotions

and shocks due to unobservable events. For example, a baby might go through a stage where a

different brand is best suited to the parent/child’s needs. Diapers also have the advantages of be-

ing regular purchases (the no-choice option is less of a concern) and consumers tend to be in the

market for many purchase occasions.

To isolate a situation favoring forward-looking behavior we focus on consumers who


24

purchase branded products and whose purchases are likely triggered by a birth event. To do so,

we eliminate private-label consumers and we select consumers whose first purchase occurs 30

weeks after the start of data collection. After applying these screening criteria the data contain

1,385 households who make 8,089 purchases.

The market is dominated by three brands, Pampers, Huggies, and Luvs. We aggregate all

other purchases as “Other Brands.” As a first-order view, Table 2 compares switching behavior

during the first eight purchases to switching behavior after the first eight purchases. Brand loyal-

ty is higher after eight periods than within the first eight periods suggesting that consumers may

learn about “quality” over time. Notice also the small (and decreasing) switching to “Other

Brands” among major-brand consumers. Finally, although the category was chosen as a likely

test-bed for consumer learning, high brand loyalty, even during the initial eight weeks, suggests

that there is no guarantee a forward-looking strategy will fit the data.


For this initial test of an index solution, we limit the to the observed price for brands

purchased and the average price across other panelists in the same period for brands not pur-

chased. We randomly select 150 households for estimation and 200 households for validation8.

7.2. Empirical Specification

We index each of households by and denote by household ’s time horizon. We as-

sume that the quality and quality-belief distributions, and , are normal and that unobserva-

ble shock distributions are Gumbel. The decision strategies are given below ( and are now

scalars.)

8 The actual sample sizes for estimation and validation are 138 and 189 because we drop a few households whose order of brand purchases is unobserved.


25

No Learning: Π ,

Myopic Learning: Π ,

Index Strategy: Π , 0, ,,

, 1, 0,,

, ,

Approximately Optimal: Π , , , , , | , .

7.3. Issues of Identification

Although we would like to identify all parameters of the various models, we cannot do so

from choice data because utility is specified only to an affine transformation, and because the pa-

rameters that matter are relative parameters. For the no-learning model we can identify only the

relative means of prior beliefs. For the myopic-learning model we can identify only the relative

means of prior beliefs, the relative uncertainties of prior beliefs, and the means of quality. For the

no-learning and myopic-learning models time discounting does not matter.

For the index strategy and approximately optimal strategy we set the mean of prior be-

liefs of one brand ( ) to zero and normalize the variance of quality ( ) to one to set the scale

of quality. (Only ⁄ matters.) We cannot simultaneously identify a brand-specific mean of

quality and a brand-specific mean of the unobservable shock, so we set the latter to zero (

0). We can identify , because we have fixed . The standard deviation of is observed in

the data and, hence, is computable from , because the observable and unobservable shocks

are independent. As in most dynamic discrete choice processes (Rust 1994), the discount factor

is not identified; we set it to 0.90.9

Finally, as in Erdem and Keane (1996), we suppress “parameter heterogeneity” among

9 As expected, sensitivity analyses with other discount rates (e.g., 0.95 and 0.99) yield almost identical log-likelihood statistics and similar parameter estimates for the index-strategy model. Anticipating the results of § 7.5, we expect similar (lack of) sensitivity for the approximately optimal strategy. The ease with which such sensitivity checks can be run is a benefit of the computational tractability of the index-strategy model.


26

households. We continue to allow each household’s quality perceptions to evolve idiosyncrati-

cally, but we do not attempt to estimate heterogeneity in prior beliefs, in the long-term mean

quality, or in the standard deviations. We abstract away from parameter heterogeneity for the fol-

lowing reasons. First, there are, on average, only 8.39 purchases per household. We would overly

strain the model by attempting to estimate heterogeneity in all of the parameters.10 Second, we

wish to focus on behavioral heterogeneity that is generated endogenously by forward-looking

learning models. Even if households start with exogenously homogeneous prior beliefs, different

quality draws and utility shocks lead to different exploitation-versus-exploration tradeoffs and

different learning dynamics. We seek to evaluate heterogeneous learning dynamics against the

data, rather than using heterogeneous parameters to fit the data. For an initial test of an index

strategy, this simplification is conservative because it biases against a good model fit. Despite

this focus, it turns out the index strategy explains the data well and predicts well.

7.4. Maximum Simulated Likelihood Estimation

We estimate each model’s parameters with maximum simulated likelihood estimation. To

simplify notation let denote the vector of parameters to be estimated. Let ∈ denote

household ’s decision at time and let denote ’s decision sequence up to time .

The likelihood of observing the choice sequences as a function of is:

(7) Pr ; .

Learning strategies depend upon the evolution of the unobserved belief states complicat-

ing the inference process. If we were to write the likelihood function as a function of each con-

10 Doing so is technically feasible, but would likely over-parameterize the model and exploit noise in the data. More importantly, our goal is to demonstrate that an index solution is a viable representation of cognitive simplicity and that cognitive simplicity (relative to the approximately optimal solution) is a phenomenon worth study in structural models. We leave explicit modeling of parameter heterogeneity to future research using data with longer purchase histories.


27

sumer’s unobserved belief states and shocks over time, we would need to sample from an ex-

tremely complicated joint density of belief states and shocks. Instead, we augment the data and

sample directly from the more-fundamental unobservables – the quality experiences, , that

are drawn conditionally i.i.d. from normal distributions with means, , and standard deviations,

. Given a set of quality experiences and a set of prior beliefs, we obtain the unobserved belief

states, , , by conjugate updating formulae:

(8) , and

,

where ∑ is the cumulative number of purchases of brand by consumer

up to and including period . (We use ∙ as an indicator function.) We use ∑ to

denote the average quality experience observed by the consumer up to and including period .

We introduce vector notation to simplify exposition. Let be the vector (over ) of mean

qualities, let be the vector (over ) of the standard deviations of quality draws, and let be the

vector (over ) of the standard deviations of the unobservable shocks. Let the sequence of quality

draws up to and including period be . And let and be the vectors (over ) of

prices and unobservable utility shocks. Let ∙ be a probability density function. Then the like-

lihood for household is given by:

(9) Π , , , ; ; , .

To compute the likelihood we integrate over quality draws and unobservable shocks. To

integrate numerically we sample sequences of quality draws (each sequence has draws for

consumer ) from a multivariate normal distribution with parameters and . We assume the


28

unobservable shocks follow zero-mean Gumbel distribution with homogenous variance for all

brands. This assumption allows us to use the well-known logit formula to substantially simplify

the computation of choice probabilities for all models, and the computation of value function

used in approximately optimal strategies (Rust 1994). To use the logit formula for the index

strategy, we linearize the index function as a linear function of the unobserved shocks , while

preserving monotonicity:

Π , 0, ,

,

, 1, 0,

,

, .

7.5. Estimation Results

Table 3 summarizes the fit statistics for the 651 diaper purchases in the in-sample estima-

tion and the 676 purchases in the out-of-sample validation. is an information-theoretic meas-

ure that calculates the percent of uncertainty explained by the model (Hauser 1978); AIC and

BIC attempt to correct the likelihood function based on the number of parameters in the in-

sample estimation, BIC more so than AIC. (There are no free parameters in the out-of-sample

validation.)


First, on all measures there are sizable gains to learning – all learning models explain and

predict brand choices substantially better than the no-learning strategy. Second, the index strate-

gy improves in-sample fit and out-of-sample predictions relative to myopic learning. The likeli-

hood is significantly better ( 22, 0.002 in-sample; 13.52, 0.009 out-of-

sample), although the percent improvement in is not large. This result is consistent with our

expectation that diaper buyers are forward-looking. Third, the index strategy outperforms the ap-


29

proximately optimal solution for in-sample fit and out-of-sample predictions, although the differ-

ences are small. (The models are not nested so a test does not apply.) This empirical result is

consistent with the synthetic-data analysis. When two forward-looking strategies yield similar

expected utilities, we expect them to be statistically indistinguishable.

Table 4 summarizes the estimated parameter values. Across all learning models, all four

brands increase substantially in mean quality relative to prior beliefs, which implies that diaper

buyers learn by experience. Results are consistent with the switching patterns in Table 2.


Forward-looking models identify the uncertainty in utility shocks relative to ongoing

quality uncertainty (last panel of Table 4). Because the relative shock uncertainty varies across

brands, the index curve implies different behavior than myopic-learning for those brands. This

explains why forward-looking models fit and predict better. For example, Huggies has lower rel-

ative shock uncertainty than other brands providing greater incentives for consumers to explore

Huggies (as per the Corollary). Because the myopic-learning model ignores this difference, it

compensates by underestimating the relative mean prior beliefs. Managerially, Huggies has a

higher mean quality than Pampers or Luvs, but also higher ongoing relative uncertainty in quali-

ty across consumption. (The table reports the ratio of shock uncertainty to quality uncertainty – a

smaller number means higher relative quality uncertainty.)

Computational time in the embedded optimization problem is a rough surrogate for think-

ing costs. The last row of Table 3 reports the time per iteration of each model. Because the index

strategy is substantially faster than the approximately optimal strategy, it is reasonable to posit

that consumers view the index strategy as having lower thinking costs. (The myopic-learning

strategy is even faster and is a reasonable model for categories in which forward-looking is less


30

likely as was predicted by the Corollary.)

Both the index-strategy and the approximately-optimal strategy lead to similar parameter

estimates. Parameter estimates of either model are usually within confidence regions of the alter-

native model. This result is consistent with the synthetic-data analyses that suggest that both

strategies lead to near-optimal utility. If we accept cognitive simplicity as a paradigm, then the

index strategy is preferred as a more plausible description of consumer behavior.

In summary, using IRI data on diaper purchases we find that (1) learning models fit and

predict substantially better than the no-learning model; (2) forward-looking learning models fit

and predict significantly better than the myopic-learning model; (3) the index strategy and the

approximately optimal solution achieve similar in-sample fit and out-of-sample forecasts, as well

as reasonably close parameter estimates; and (4) computational (and thinking) costs favor the in-

dex-strategy model relative to the approximately optimal model.

8. Summary, Conclusions, and Future Research

Models of forward-looking consumer learning are important to marketing. These theory-

driven models examine how consumers make tradeoffs between exploiting and exploring brand

information. Managerially, these models enable researchers to investigate effects due to ongoing

quality variation, Bayesian learning, and the variation in utility shocks. However, the optimal so-

lution (assuming no thinking costs) requires an assumption that consumers solve in their heads

an optimization problem that is PSPACE-hard. Cognitive simplicity is a plausible theory that is

rooted in consumer behavior, psychology, and economics. Cognitive simplicity assumes that

consumers solve the meta problem that maximizes utility minus thinking costs.

In this paper we propose and evaluate an alternative theory of learning – that consumers

use an index strategy when looking forward. We prove analytically that an index strategy exists


31

for canonical consumer learning models and that the index function has simple properties that

consumers can intuit. Using synthetic data we demonstrate that an index solution achieves near

optimal expected utility, and is fast to compute. Using IRI data on diaper purchases, we show

that at least one index solution fits the data and predicts out-of-sample significantly better than

either a no-learning model or a myopic-learning model. The index strategy produces estimation

results (and hence managerial implications) that are quite similar to an approximately optimal so-

lution. The index solution has significantly lower computational costs and, we believe, is more

likely to describe consumer behavior.

We address many issues, but many issues remain. We abstract away from risk aversion,

advertising as a quality signal (the IRI dataset for the diaper category does not track advertising),

and inventory problems. Theoretically, risk aversion does not affect the indexability result. The

consequence of incorporating advertising signals depends on how consumers learn. Although

previous work has not found significant inventory effects (Ching et al. 2011), it is theoretically

interesting to extend index strategies to capture inventory concerns.

Finally, diaper buyers are likely forward-looking, but consumers in other product catego-

ries may not be. Our theory suggests that consumers are most likely to be forward-looking when

shock uncertainty is small compared to quality uncertainty; we expect myopic-learning models to

do well when shock uncertainty is large. An index solution appears to be a reasonable tradeoff

for diaper consumers, but other cognitively-simple solutions might do even better. Future re-

search can explore these solutions using either field data or laboratory experiments.


32

References

Ackerberg, D. A. 2003. Advertising, learning, and consumer choice in experience good markets:

an empirical examination. International Economic Review, 44 (3), 1007–1040.

Bettman, J. R., M. F. Luce, and J.W. Payne. 1998. Constructive consumer choice processes.

Journal of Consumer Research, 25 (3), 187–217.

Brezzi, M. and T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. Jour-

nal of Economic Dynamics and Control, 27 (1), 87-108.

Bröder, A. 2000. Assessing the empirical validity of the “take the best” heuristic as a model of

human probabilistic inference. Journal of Experimental Psychology: Learning, Memory,

and Cognition, 26, 5, 1332-1346.

Bronnenberg, B. J., M. W. Kruger, and C. F. Mela. 2008. The IRI marketing data set. Marketing

Science, 27 (4), 745 – 748.

Bruner, J. S. 1957. On perceptual readiness. Psychological Review, 64, 123–152.

Camerer, C. F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction

(Roundtable Series in Behavioral Economics). Princeton University Press.

Chang, F. and T. L. Lai. 1987. Optimal stopping and dynamic allocation. Advances in Applied

Probability, 19 (4), 829-853.

Ching, A., T. Erdem, and M. P. Keane. 2011. Learning models: an assessment of progress, chal-

lenges and new developments. SSRN eLibrary.

Ching, A. and M. Ishihara. 2010. The effects of detailing on prescribing decisions under quality

uncertainty. Quantitative Marketing and Economics, 8, 123–165.

Chintagunta, P., T. Erdem, P. E. Rossi and M. Wedel. 2006. Structural modeling in marketing:

review and assessment, Marketing Science, 25 (6), 604-616.


33

Chintagunta, P., R. Jiang and G. Z. Jin. 2009. Information, learning, and drug diffusion: the case

of Cox-2 inhibitors. Quantitative Marketing and Economics, 7 (4), 399–443.

Gilbride, T. J. and G. M. Allenby. 2004. A choice model with conjunctive, disjunctive, and com-

pensatory screening rules. Marketing Science, 23 (3), 391-406.

Crawford, G. S. and M. Shum. 2005. Uncertainty and Learning in pharmaceutical demand.

Econometrica, 73 (4), 1137–1173.

Erdem, T. and M. P. Keane. 1996. Decision-making under uncertainty: capturing dynamic brand

choice processes in turbulent consumer goods markets. Marketing Science, 15 (1), 1–20.

Erdem, T., M. P. Keane, and B. Sun. 2008. A dynamic model of brand choice when price and

advertising signal product quality. Marketing Science, 27 (6), 1111 – 1125.

Erdem, T., M. P. Keane, T. S., and J. Strebel. 2005. Learning about computers: an analysis of in-

formation search and technology choice. Quantitative Marketing and Economics, 3, 207–

247.

Gabaix, X., and D. Laibson. 2000. A boundedly rational decision algorithm. American Economic

Review, 90 (2), Papers and Proceedings, 433-438.

Gabaix, X., D. Laibson, G. Moloche, and S. Weinberg. 2006. Costly information acquisition: ex-

perimental analysis of a boundedly rational model. American Economic Review, 96 (4),

1043–1068.

Gigerenzer, G. and D. G. Goldstein. 1996. Reasoning the fast and frugal way: models of bound-

ed rationality. Psychological Review, 103 (4), 650–669.

Gittins, J. 1989. Bandit Processes and Dynamic Allocation Indices. John Wiley & Sons, Inc.:

New York, NY.


34

Gittins, J. and D. Jones. 1974. A dynamic allocation index for the sequential design of experi-

ments. In J. Gani (Ed.), Progress in Statistics, North-Holland. Amsterdam, NL, 241–266.

Goldstein, D. G. and G. Gigerenzer. 2002. Models of ecological rationality: the recognition heu-

ristic. Psychological Review, 109 (1), 75 – 90.

Guadagni, P. M. and J. D. C. Little. 1983. A logit model of brand choice calibrated on scanner

data. Marketing Science, 2 (3), 203–238.

Hauser, J. R. 1978. Testing the accuracy, usefulness and significance of probabilistic models: an

information theoretic approach. Operations Research, 26 (3), 406-421.

Hauser, J. R., O. Toubia, T. Evgeniou, D. Dzyabura, and R. Befurt 2010, cognitive simplicity

and consideration sets. Journal of Marketing Research, 47, (June), 485-496.

Hauser, J. R. and G. L. Urban. 1986. The value priority hypotheses for consumer budget plans.

Journal of Consumer Research, 12 (4), 446–462.

Hauser, J. R., G. L. Urban, G. Liberali, and M. Braun. 2009. Website morphing. Marketing Sci-

ence, 28 (2), 202–223.

Hauser, J. R., G. L. Urban, and B. Weinberg. 1993. How consumers allocate their time when

searching for information. Journal of Marketing Research, 30 (4), 452-466.

Hutchinson, J. M. C. and G. Gigerenzer. 2005. Simple heuristics and rules of thumb: where psy-

chologists and behavioural biologists might meet. Behavioural Processes, 69, 97-124.

Jacoby, L. L., and M. Dallas. 1981. On the Relationship Between Autobiographical Memory and

Perceptual Learning. Journal of Experimental Psychology: General, 110, 306–340.

Johnson, E. J. and J. W. Payne. 1985. Effort and accuracy in choice. Management Science, 31,

395-414.


35

Kim, J. B., P. Albuquerque, and B. J. Bronnenberg. 2010. Online demand under limited consum-

er search. Marketing Science, 29 (6), 1001–1023.

Kohli, R., and K. Jedidi. 2007. Representation and inference of lexicographic preference models

and their variants. Marketing Science, 26 (3), 380-399.

Marewski, J. N., R. F. Pohl, and O. Vitouch. 2010. Recognition-based judgments and decisions:

introduction to the special issue (Vol. 1). Judgment and Decision Making, 5 (4), 207–215

Martignon, L. and U. Hoffrage. 2002. Fast, frugal, and fit: simple heuristics for paired compari-

sons. Theory and Decision, 52, 29-71.

Mehta, N., X. J. Chen, and O. Narasimhan. 2008. Informing, transforming, and persuading: dis-

entangling the multiple effects of advertising on brand choice decisions. Marketing Sci-

ence, 27 (3), 334 – 355.

Narasimhan, C. 1988. Competitive promotional strategies. Journal of Business, 61 (4), 427 –

449.

Narayanan, S. and P. Manchanda. 2009. Heterogeneous learning and the targeting of marketing

communication for new products. Marketing Science, 28 (3), 424 – 441.

Narayanan, S., P. Manchanda, and P. K. Chintagunta. 2005. Temporal differences in the role of

marketing communication in new product categories. Journal of Marketing Research, 42

(3), 278–290.

Papadimitriou, C. H. and J. N. Tsitsiklis. 1999. The complexity of optimal queuing network con-

trol. Mathematics of Operations Research, 24 (2), 293–305.

Payne J.W., J. R. Bettman and E. J. Johnson. 1988. Adaptive strategy selection in decision mak-

ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 534-

552.


36

Payne J.W., J. R. Bettman and E. J. Johnson. 1993. The Adaptive Decision Maker. Cambridge

University Press: Cambridge UK.

Roberts, J. H. and G. L. Urban. 1988. Modeling multiattribute utility, risk, and belief dynamics

for new consumer durable brand choice. Management Science, 34 (2), 167–185.

Rust, J. 1994. Structural estimation of Markov decision processes. Handbook of Econometrics,

Chapter 51. Elsevier: Maryland Heights, MO.

Rust, J. 1996. Numerical dynamic programming in economics, Handbook of Computational

Economics, Chapter 14. Elsevier: Maryland Heights, MO.

Shugan, S. M. 1980. The cost of thinking. Journal of Consumer Research, 7 (2), 99–111.

Simon, H. A. 1955. A behavioral model of rational choice. Quarterly Journal of Economics, 69

(1), 99–118.

Simon, H. A. 1956. Rational choice and the structure of the environment. Psychological Review,

63 (2), 129–38.

Tversky, A. and D. Kahneman. 1973. Availability: a heuristic for judging frequency and proba-

bility. Cognitive Psychology, 5, 207–232.

Whittle, P. 1988. Restless bandits: activity allocation in a changing world. .Journal of Applied

Probability, 25, 287–298.

Yee, M., E. Dahan, J. R. Hauser and J. Orlin. 2007. Greedoid-based noncompensatory inference.

Marketing Science, 26 (4), (July-August), 532-549.


37

Table 1. Comparing Decision Strategies on Expected Utility and Thinking Costs

Average Consumer Utility [subject to affine transformation]

(standard errors in parentheses)

No Learning Myopic Learning Index Strategy Approximately Optimal

Number of grid points n/a n/a 200 x 50 (200 x 50)2

Computation time (surrogate for thinking costs) negligible negligible 102 seconds 6 x 104 seconds

Relatively low uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2)

( , ) = (0.0, – 0.3) 0.041 1.801 1.992 1.996

(0.003) (0.043) (0.045) (0.045)

( , ) = (0.0, 0.0) 0.618 3.352 3.544 3.547

(0.003) (0.049) (0.052) (0.052)

( , ) = (0.0, + 0.3) 3.036 5.298 5.323 5.327

(0.003) (0.056) (0.056) (0.056)

Relatively high uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2)

( , ) = (0.0, – 0.3) 4.919 5.762 5.767 5.768

(0.026) (0.047) (0.047) (0.047)

( , ) = (0.0, 0.0) 6.182 7.150 7.190 7.190

(0.027) (0.050) (0.052) (0.052)

( , ) = (0.0, + 0.3) 7.946 8.912 8.911 8.912

(0.026) (0.054) (0.054) (0.054)


38

Table 2. Transition Percentage among Diaper Brands

Percent of Households that Purchase Column Brand in Period 1 if Purchased Row Brand in Period

Pampers Huggies Luvs Other Brands

Within the first eight purchases

Pampers 66.7% 19.7% 10.9% 2.7%

Huggies 23.1% 64.3% 11.3% 1.2%

Luvs 18.1% 19.7% 58.8% 3.4%

Other Brands 21.2% 21.2% 20.0% 37.6%

After the first eight purchases

Pampers 76.5% 12.6% 9.7% 1.1%

Huggies 18.7% 75.9% 4.9% 0.5%

Luvs 20.7% 9.5% 67.5% 2.3%

Other Brands 14.1% 9.0% 23.1% 53.8%

Table 3. In-Sample and Out-of-Sample Fit Statistics for Diaper Data Estimation

No

Learning

Myopic

Learning

Index

Strategy

Approximately

Optimal

In-sample estimation statistics

Log likelihood - 777.17 - 480.81 - 469.81 - 472.16

U2 (percent information) 78.50% 86.68% 86.99% 86.92%

AIC 1562.34 985.63 971.63 976.33

BIC 1580.25 1039.37 1043.29 1047.99

Number of parameters 4 12 16 16

Out-of-sample validation statistics

Log likelihood - 1058.15 - 745.55 - 738.79 - 741.63

U2 (percent information) 78.77% 85.04% 85.18% 85.15%

Computational Time (sec) 0.15 0.29 40.5 857.8


39

Table 4. Maximum Likelihood Estimates for Prior Beliefs, Mean Quality, and the Relative Magnitude of the Variation in Utility Shocks

No

Learning

Myopic

Learning

Index

Strategy

Approximately

Optimal

Relative mean of prior beliefs,

Pampers 0.00 0.00 0.00 0.00

– – – –

Huggies -0.29 -0.08 -0.87 -0.48

(0.10) (0.18) (0.21) (0.12)

Luvs -0.66 -0.90 -1.40 -1.19

(0.12) (0.22) (0.19) (0.81)

Other Brands -2.80 -2.49 -2.03 -2.97

(0.25) (0.36) (0.57) (1.64)

Uncertainty of prior beliefs, , relative to ongoing quality uncertainty, †

Pampers – 0.56 0.56 0.62

– (0.12) (0.07) (0.16)

Huggies – 0.44 0.43 0.47

– (0.09) (0.01) (0.05)

Luvs – 1.50 1.54 2.01

– (0.51) (0.07) (1.59)

Other Brands ‡ – 11.68 17.67 13.88

– (41.08) (109.44) (15.27)

Mean quality (long-term),

Pampers – 4.84 3.13 2.69

– (0.74) (0.71) (0.56)

Huggies – 6.07 6.36 5.60

– (1.09) (0.58) (0.11)

Luvs – 3.21 2.26 1.95

– (0.45) (0.50) (0.67)

Other Brands – 0.71 0.54 0.45

– (0.55) (0.37) (0.32)

Uncertainty in utility shocks, , relative to ongoing quality uncertainty, †

Pampers – – 0.70 0.65

– – (0.25) (0.27)

Huggies – – 0.11 0.11

– – (0.04) (0.04)

Luvs – – 0.57 0.56

– – (0.12) (0.23)

Other Brands – – 2.65 2.02

– – (0.53) (0.86)

† Standard errors relative to . ‡ The likelihood is particularly flat in this parameter estimate.


40

Figure 1. Index Strategies Balance Utility and Cognitive Simplicity (Conceptual Diagram)

ApproximatelyOptimal

Myopic Learning

No Learning

Cognitive Simplicity

Optimal Utility

Index Strategy

Utility


41

Figure 2: Gittins’ Index (Evolution and Variation)

(a) Realized Indices for Two Brands (Consumer chooses brand with largest index value.)

(b) Index Values (Normalized) Vary with the Consumer’s Experience

‐0.8

‐0.6

‐0.4

‐0.2

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25 30 35 40 45 50

Realized Gittins' In

dex

Period

Brand 1

Brand 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50

Gittins' In

dex

Period

Mean of Posterior Belief = 0

Mean of Posterior Belief = 0.2


42

Figure 3. Whittle’s Index as Utility Shock Magnitude and Experience Vary

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Whittle's Index

Period

Shock Magnitude = 0.01





43

Online Appendix A. Proof of Indexability

Without loss of generality, we set observable shock to zero and use to represent all

utility shocks. The focus is on the sub-problem where the consumer chooses a uncertain brand

against a certain reward . To simplify notation, we drop the brand identifier . The Bellman

equation for this problem is:

(A1) , ,

max , , | , | , , | , ,

where summarizes the consumer’s belief about the brand at time t. The definition of

indexability is that, for any state , , if it is optimal to choose the fixed reward , then it must

be also optimal to choose the fixed reward for any . This is equivalent to the following:

(A2) , , | | , , | , 0

⇒ , , | , , , | 1.

Intuitively, this is saying the expected future value of choosing the uncertain brand

should not grow too much compared to that of choosing the fixed reward , as increases. It

turns out the assumptions of the canonical problem of consumer learning are sufficient, though

not necessary.

We first define the expected value function , by integrating out :

(A3) , , , | .

The sub-problem can then be reduced to the fixed point:

(A4) , max , , | , | .

To see this, using definition in Equation (A3):


44

,

, , | , | , , | ,

max , , |

, , | | ,

max , , | , |

max , , | , |

where in the second equality we have used the assumption that and are i.i.d. so that

, , | , , = , , . In the third equality, we have

used the assumptions that and are independent of and .

Denote 0 as the option of the certain reward , and 1 as the uncertain brand. We define

the following quantities:

(A5) , , and , | , | .

First observe that the conditional probability of choosing 1 is given by:

(A6) 1| , , ,

,

max , , ,,

,.

The last equality is by interchanging the integration and differentiation and uses the defi-

nition of function. Similarly we have


45

(A7) 0| ,,

,.

Differentiate both sides of Equation (4) with respect to and use the Chain Rule:

(A8) , , ,

,

, ,

,

,

0| , 1 , 1| , , | ,

where the last equality uses Equations (A5), (A6), and (A7). The following lemma is useful to

establish indexability.

Lemma 1. For all , , we have

(A9) 0 ,1

1.

Proof. Fix any , , and . Suppose π* is the optimal policy that solves , , . First, if a posi-

tive constant c is added only to the fixed reward in every period but the uncertain brand re-

mains unchanged, then following π* yields an expected total utility at least at large at , , .

Therefore, , , , , . Second, if a positive constant c is added to both the fixed

reward and the uncertain brand in every period, then π* is optimal and yields expected total

utility of , , c/ 1 . By construction, adding a positive constant to both options

yields expected utility at least as high as adding the constant only to the fixed reward:

, , c/ 1 , , . Integrating out we have:

(A10) , , ,c

1.

It follows that:

(A11) 0, , 1

1.

Taking the limit on both sides as c→0 establishes the lemma.


46

Lemma 1 implies that 0 , 1 , . This result, together with Equa-

tion (A8), implies that 1 , , | . It then follows that:

(A12) , ,1 , , | 0,

which establishes the indexability condition in Equation (A2).

Online Appendix B. Proof of Proposition 2

We first prove two useful lemmas. The focus is again on the sub-problem of a single

brand and thus we drop the brand identifier j.

Lemma 2. Fix a prior , , and a quality sample : 0 . Consider a modified ver-

sion of the original sub-problem where the utility shocks become for all t, and the

fixed reward becomes . Denote and as the expected value and index value

for the modified problem. Then for any belief state , we have:

(A13) , ,1

,

(A14) , ; , , ; , .

Proof. We first prove that satisfies the fixed points defined by Equation (A4) for the modi-

fied problem. Suppose Equation (A13) holds, then

(A15) , ,

,1

,1

,

(A16) , | , |

| , |

1,

1,


47

where the definitions of and follow Equation (A5) for the modified problem. Define

∆ , , , . Then ∆ , ∆ , for the modified problem.

The assumption that the distribution of has scale and location parameters implies

(A17) ∆ Pr ∆ Pr ∆ ∆ , and

∆ ∆

.

The right hand side of Equation (A4) for the modified problem becomes

(A18) ∆

, 1 ∆ , ∆

∆

,1

1 ∆ ,1

∆

∆

, 1 ∆ , ∆ 1

,1

= , ,

which is the left hand side of Equation (A4). The second equality follows from Equations (A15)

to (17). The third equality uses the fact that , is the fixed point of Equation (A4). There-

fore, , also satisfies the fixed point of Equation (A4) for the modified problem.

For the second part of the lemma, we use the definition of Whittle index which is the

breakeven value of such that the two terms inside the curly brackets of Equation (1) are equal:

(A19) , , .

It suffices to show the proposed relation in Equation (A14) solves the above equality.

Note the right hand side of Equation (A19) is


48

, ,1

,1

,1

, ,

which is the left hand side. The first and last equalities follow from Equations (A15) and (A16).

The third equality follows from the definition of Whittle index for the original problem

ting , ). Then by the definition of Whittle index, , ; ,

, ; , .

Lemma 3. Fix the original sub-problem. Consider a modified problem where the quality sample

becomes : 0 , the utility shocks becomes for all t, the prior belief

becomes , , , and the fixed reward becomes . Then

for all , , , . Denote and as the expected value and in-

dex value for the modified problem. Then for any belief state , we have:

(A20) , ,1

,

(A21) , ; , , ; , .

Proof. The proof is similar to that of Lemma 2. Note that the Bayesian updating implies that for

all t the precision of the modified problem remains the same as that of the original problem:

(A22)

.

It follows that the updated posterior mean and variance have the following relationships:

(A23) 1 1 ,

(A24) 1 1 .


49

Therefore, the belief state in the next period preserves the relationship:

, , .

We then check whether satisfies the fixed points defined by Equation (A4) for the

modified problem. Suppose Equation (A20) holds, then

(A26) , ,

,1

,1

,

(A27) , | , |

,1

| , |

,1

| , |

, | , |1

,1

.

The second equality uses the fact that , , / 1 .

The third equality follows from normality and conjugate prior assumptions for distribution of

quality and belief . Then ∆ , , ,

∆ , for the modified problem. The assumption that the distribution of has scale and loca-

tion parameters implies

(A28) ∆ ∆ ,∆ ∆

.

The right hand side of Equation (A4) for the modified problem becomes


50

(A29) ∆

, 1 ∆ , ∆

∆

,1

1 ∆ ,1

∆

∆

, 1 ∆ , ∆ 1

,1

= , ,

which is the left hand side of Equation (A4). The second equality follows from Equations (A26)

to (A28). The third equality uses the fact that , is the fixed point of Equation (A4).

Therefore, , also satisfies the fixed point of Equation (4) for the modified problem.

For the second part of the lemma, we again use the definition of Whittle index in Equa-

tion (A19). It suffices to show the proposed relation in Equation (A21) solves the above equality.

Note the right hand side of Equation (A19) is

, ,1

,1

, , ,

which is the left hand side. The first and last equalities follow from Equations (26) and (27). The

third equality follows from the definition of Whittle index for the original problem

ting , ). Then by the definition of Whittle index, , ; ,

, ; , .

To complete the proof of the proposition, note that by Lemma 3 we have

, , ; , , , , ; , , .


51

Setting 1/ and / yields

, , ; , , 0, , ; 1, ,

0, , ; 1, 0,

0, , ; 1, 0, ,

where the second equality uses Lemma 2.

Online Appendix C. Proof of Proposition 3

The focus is again on the sub-problem of a single brand and thus drop the brand identifier j.

Proof of Proposition 3(1). The first part that the Whittle index is increasing in posterior mean

is evident from Proposition 2. For the second part, fix some belief state and consider any

. Let and be the corresponding Whittle index. Recall that ∆ , ,

, . Then by the definition of index, ∆ , ∆ , . Note that

(A30) ∆ ,∆ , , ,

0 ,

where the inequality is implied by (A12). It then follows that .

Proof of Proposition 3(2). The will prove the first part, and omit the second part which uses a

similar argument. Fix some . Let , , … be a sequence of random variables condi-

tionally i.i.d. from distribution , . Let , , … be a sequence of random variables

conditionally i.i.d. from distribution , . The two sequences are independent. Con-

struct a sequence of random variables such that for all . Then , , … are con-

ditionally i.i.d. from distribution , . Fix some policy applied to solve the problem un-


52

der , , … . Denote , 0 if the fixed reward is chosen, and , 1 other-

wise. Then the value function under state , when is applied becomes

, , ;

, 1 , 0 , ,

, 1 , 0 , ,

, 1 , 0 , ,

, 1 , ,

, , ; ,

where the last equality uses the fact that the second term is equal to zero. Note that

, , ; , , ; because the latter is the optimal value function. Therefore

, , ; , , ; for all and taking maximum on the left hand side gives

, , ; , , ; . Integrating out then yields , ; , ; . Then

we have , ; , ; ⁄ 0 for all , , . Differentiating both sides of Equa-

tion (A4) with respect to and using chain rule give:

, ; , ;

, ;

, ; , ;

, ;

, ;

0| , ; , ; , 1| , ; , , ; , | .

The last equality implies that


53

, ; 1| , ; ,

1 0| , ;, ; | .

It then follows that

∆ , ;

, ; , ; , |

1| , ;

1 0| , ;1 , ; |

1

1 0| , ;, ; | 0,

where the last inequality uses , ; | 0. Let and be the Whittle index

corresponding to and . This inequality implies ∆ , ; ∆ , ; . Since by defini-

tion ∆ , ; ∆ , ; , we have ∆ , ; ∆ , ; . It then follows that

by Equation (A30).

Proof of Proposition 3(3). Consider any . By the invariance property we have

, , ; , , , , ; , ,

, , ; , ,

, , ; , , ,

where the first equality follows from Proposition 3(2).

Online Appendix D. Computation of Index Function

We can use the invariance property to simplify the computation of Whittle index. The

computation is based on the fixed point problem in Equation (A4) and the definition of Whittle


54

index. Product identifier j is dropped to simplify notation. Note that the function evaluated at

0 is

0, , max 0, , , , , |0,

max 0, , , 1 , , |0,

max 0, , , 0, ,1

|0,

max 0, , , 0, , |0, ,

where ⁄ is the precision. The first equality is implied by Bayesian updating

formula for normal distribution. The second equality uses the fact that the expectation of and

conditional on 0 are both zero, and Equation (21) from Lemma 3. The last equality again

uses zero expectation of conditional on 0.

We now treat as a state variable. Let . Note that the distribution of

conditional on belief , is normal with mean and standard deviation . There-

fore | ~ , . The fixed point problem now only involves function

fixed at 0, and evolves on the state space , :

0, , max 0, , , 0, , |0, , .

Standard dynamic programming algorithms can be used to solve the above fixed point.

Given the solution of 0, , , we can compute the Whittle index for various values of ran-

dom shocks under 0: 0, , ; , , . The index evaluated at any value of posterior

mean is then computed by linear summation as implied by the invariance property.


55

Online Appendix E. Simulation of Value Function

Given a decision rule Π, we compute the value function (expected total utilities) by for-

ward simulating utilities for a sufficiently long horizon. Starting at a given state , , ,

we sample a large number D of Markov chains for each brand , : 1, 2, … ,

where is greater than the truncated time horizon . Bayesian updating of normal distribution

leads to the following state transition:

, | , ~ , , , , , , | , , 1 , ,

, ~ ; , , where , ,

, .

These sequences of belief states are then fixed in advance and reused for each decision

rule. Under a decision rule Π, the empirical estimate of its expected total utility for a truncated

time horizon is given by:

,1 Π , , , , , ,

where is the cumulative number of trials for brand up to time . Note that the realized state

values are chosen from the pre-drawn sample paths, with indicating which state in the sample

path is chosen.

learning from experience, simply - mitweb.mit.edu/hauser/www/hauser pdfs/lin_zhang_hauser...learning...

Documents