two neurocomputational building blocks of social norm...

1

Two Neurocomputational Building Blocks of Social Norm Compliance

Abstract Current explanatory frameworks for social norms pay little attention to why

and how brains might carry out computational functions that generate norm compliance

behavior. This paper expands on existing literature by laying out the beginnings of a

neurocomputational framework for social norms and social cognition, which can be the basis

for advancing our understanding of the nature and mechanisms of social norms. Two

neurocomputational building blocks are identified that might constitute the building blocks of

the mechanism of norm compliance. They consist of Bayesian and Reinforcement Learning

systems. It is sketched why and how the concerted activity of these systems can generate

norm compliance by minimization of three specific kinds of prediction-errors.

Keywords: Social norms; Bayesian brain; reinforcement learning; uncertainty minimization

Social Norm Compliance from a Neurocomputational Perspective

Philosophers, psychologists, anthropologists, and economists have offered different accounts

of social norms (e.g. Bicchieri 2006; Binmore 1994; Boyd and Richerson 2001; Elster 1989;

Gintis 2010; Lewis 1969; Pettit 1990; Sugden 1986; Ullmann‐Margalit 1977). Many facts are

known about social norms, both at the individual and at the social level (Sripada and Stich

2007). However, much existing research is partial and piecemeal, making it difficult to know

how individual findings cohere into a comprehensive picture. Relatively little effort has been

spent in laying out a framework that could unify these facts and advance interdisciplinary

research on social/moral1 behavior.

1 Although moral norms do not seem to be sharply distinct from social norms or, say, norms

of disgust, there is a spectrum of social behaviors, some of which tend to be more readily

called ‘moral.’ Specifically, behavioral patterns that typically involve a victim who has been

2

Computational cognitive neuroscience has the opportunity to make valuable

contributions to our understanding of social/moral behavior. Such understanding can be

grounded in a computational, biologically plausible framework, which can unify existing

knowledge about norms, and help to guide the study of social normativity across multiple

disciplines in a way where the concepts and data used by researchers are informed,

constrained and modified by ideas and results from multiple disciplines.

What follows lays out the beginnings of a neurocomputational framework for social

norms. Within this framework, two building blocks of social norm compliance are identified

that might constitute the basis for the mechanism of norm compliance. They consist of

Bayesian and Reinforcement Learning (RL) systems. It is canvassed why and how the

concerted activity of these systems could generate norm compliance by minimization of three

kinds of prediction-errors. On this account, Bayesian systems compute social representations,

while RL systems draw on social representations to learn to comply with norms during social

interaction.

The suggestion is that social/moral behavior piggybacks on neural computations that

enable agents to process incoming sensory input so as to form probabilistic beliefs about the

states of the world causing that input, and to choose actions so as to maximize the value of

their future reward outcomes in the social world. Agents might learn social norms as they do

other regularities in their environment, and comply with them courtesy of basic types of

neural computations, which operate in both social and non-social contexts. Thus, social norms

could be grounded in features of human nature, which are more fundamental than either the

beliefs and preferences of individuals or the idiosyncratic characteristics of the culture in

harmed, whose rights have been violated, or who has been subject to some injustice seem to

be more readily qualified as ‘moral’ norms.

3

which they live. The concerted activity of the Bayesian – RL systems would generate social

norm compliance as opposed to any other form of behavior because of the social nature of the

representations that they transform and consume.

Three points should be clear about the nature and scope of this proposal before

proceeding to unpacking it. First, the approach adopted here is unlike that of a number of

philosophers, cognitive scientists, and social scientists working on social norms within the

tradition of rational choice theory. Most of the existing accounts of norms are rational

reconstructions of the concept of social norm, which “specify in which sense one may say

that norms are rational, or compliance with a norm is rational” (Bicchieri 2006, pp. 10-11).2

My project is not intended to be a rational reconstruction. What sets my proposal apart is that

it consists of a descriptive hypothesis, framed in terms of neural computations, about some

core aspects of the mechanism of norm compliance. Hence, I am not concerned with

2 A nice and important example of rational reconstruction is Bicchieri’s (2006) account of

norms. For Bicchieri, social norms should be understood in game-theoretical terms as Nash

equilibria that result from transforming a mixed-motive game such as the prisoner’s dilemma

into a coordination game. The idea is that social norms solve social problems, in which each

of the agents has a selfish interest to defect from the strategy that would provide the socially

superior outcome if everybody followed it. When a social norm exists in problems of this sort,

agents’ preferences and beliefs will reflect the existence of this norm. Accordingly, the

payoffs of the problem will change in such a way that agents playing the socially superior

strategy will now play an optimal equilibrium.

4

normative questions such as “Under what conditions social norm compliance is rational?” My

approach is not concerned with the content of the norms with which people ought to comply.3

Second, the Bayesian - RL approach I am advocating should be understood as a

hypothesis about the functioning of the neural systems supporting social/moral cognition,

rather than a proven solution to the task of complying with norms. Although a growing body

of evidence from computational cognitive neuroscience strongly suggests that different

perceptual systems (e.g. vision) might perform some form of Bayesian inference, and that

multiple neural circuits (e.g. the basal ganglia) might implement some types of RL-

algorithms, the Bayesian and RL views on neural functioning are not universally accepted (cf.

Berridge 2007; Bowers and Davis 2012).

Finally, the framework I put forward is intended for social norm compliance, but it can

be much more encompassing (cf. Clark 2013, Friston 2010). As mentioned above, what

3 One of the consequences of my proposal is that agents can comply with “irrational” or even

“immoral” norms indeed. If evolutionary pressure does not operate primarily over what is

learned (the object of learning and decision-making), but over the learning and decision-

making systems themselves (how such systems learn and make decisions), it is plausible that

agents sometimes can learn and comply with norms that, in some sense, are “irrational” or

even “immoral” (cf. e.g. Seymur, Yoshida and Dolan 2009). Consistent with this consequence

is the view that there may well be no genetically-based special purpose neural network for

social/moral learning and decision-making. The acquisition and implementation of specific

norms would rather depend on “downstream ecological and epistemic engineering” (Sterelny

2003). The idea is that parental, upstream generations structure the downstream informational

environment where the next generation develops so that the specific social norms embedded

in that environment are more easily learnt and followed.

5

restricts my proposed account to social norms is the social nature of the representations that it

posits. The main reason why the proposal is intended to be tailored specifically to social

norms and social/moral cognition is that the time is ripe for making systematic, genuinely

interdisciplinary, progress in the science of norms, by identifying the kinds of functions that

brains need to compute to generate norm compliance. My hope is that a Bayesian – RL

approach will at least highlight fruitful research directions in social/moral cognitive science.

Two Computational Problems for Social Cognition

Human agents live in a world populated by other people. We are bound to act in the presence

of others. We are also bound to interact with others. The behavior of two or more agents can

be said to be co-adaptive if it contributes to the agents’ satisfying their desires, preferences,

and needs in the environment in which they are embedded. Agents are best able to make plans

and satisfy their desires when they are able to predict what their environment will be like over

time. Since human agents are embedded in a social environment, they are best able to make

plans and satisfy their desires when they are able to predict each other’s behavior and changes

in their social landscape.

One way in which agents can successfully make predictions of these kinds is by

relying on prediction-errors. A prediction-error is the difference between an actual and an

expected outcome (Niv and Schoenbaum 2008). It can be used to update expectations about

what the future holds in order to make more accurate predictions, and, ultimately, to facilitate

adaptive learning and decision-making. The amount of prediction-error in an agent’s

cognitive system can be understood as the agent’s uncertainty. The less prediction-error an

outcome brings about in the agent’s cognitive system, the less uncertain is the agent about that

outcome, and vice versa (cf. Friston 2010).

6

It is easier to make plans and satisfy one’s desires when we are surrounded by agents

who routinely engage in “normal,” expected behavior. As various authors including Schotter

(1981), Clark (1997, Ch. 9), Ross (2005, Ch. 6-7), and Smith (2008) have emphasized, social

institutions can be understood as external “scaffolds” that constrain and channel people’s

behavior cueing specific types of cognitive routines and actions. While social institutions may

facilitate the attainment of certain needs, desires, and goals, whether at the individual or at the

social level, they contribute to “normalize” human behavior making it reliably predictable.

In the words of the anthropologist Mary Douglas:

“Institutional structures [can be seen as] forms of informational complexity. Past

experience is encapsulated in an institution’s rules, so that it acts as a guide to what to expect

from the future. The more fully the institutions encode expectations, the more they put

uncertainty under control, with the further effect that behavior tends to conform to the

institutional matrix […]. They start with rules of thumb, and norms; eventually, they can end

by storing all the useful information” (Douglas 1986, p. 48).

Social norms are instances of social institutions that act as guides “to what to expect

from the future.” Social norm compliance is one prominent class of “normal” behavior. By

complying with social norms, agents reduce their uncertainty about the possible outcomes that

social interaction can bring about. The more fully social norms are constituted by

expectations, the more “they put uncertainty under control;” under the pressure of social

norms, behavior tends to acquire distinct boundaries and “disorder and confusion disappear.”

Social norms would then be uncertainty-minimizing devices, and social norm compliance one

of the tricks that we employ to interact co-adaptively and smoothly in our social environment.

By complying with norms, agents minimize uncertainty over their social interactions; and by

7

minimizing uncertainty over their social interactions, agents’ cognitive systems tend to

become “models” of the social environment in which the agents are embedded. Thus, norm

compliance contributes to make social environments transparent, with agent’s meeting one

another’s expectations.

This last claim involves some important idealizations, however. There are at least four

facts related to social norm compliance that contribute to make social environments opaque:

normative pluralism, normative context-sensitivity, normative clash, and normative

gradability. Any adequate explanatory framework for norms should have the conceptual

resources to accommodate these facts, which I now briefly discuss.

First, many agents live in social environments that are not normatively uniform.

Normative pluralism is ubiquitous: there are many different social norms governing a society,

which may not be reducible to each other or to some “super” social norm. This plurality

makes it very hard for agents to acquire a comprehensive model of a social environment, a

model of all or even most of the norms that govern social interactions.

Furthermore, social norms are context-sensitive: norm compliance is conditional on

having the right kind of representation of a context (cf. Bicchieri 2006, Ch. 2). Whether we

have the right representation of a context—one that calls for norm compliance—depends on

which situational cues are present in the context. However, there is no straightforward

mapping between the situational cues in a given context and how agents represent that

context; and there is no straightforward mapping between representing a context in a certain

way and compliance with a norm. Different social norms may apply to the same type of

context, and different types of contexts may be governed by the same type of social norm. In

North America, for example, the social norm of tipping generally applies in restaurants and

after taxi rides. But it does not apply at shoe shops or at most fast foods. The fact that service

is especially good in a restaurant may cue diners in North America to give a generous tip to

8

the waitress or the waiter. But the same fact does not generally cue the same behavior in

Japan. If a feature makes a given situation as one that calls for norm compliance, it does not

follow that the feature always makes the same type of situation as one that calls for norm

compliance. Whether a feature in a situation counts as a cue for norm compliance for an

agent, and if so, what exact role it is playing there is sensitive to other features in that

situation and to the learning trajectory of the agent. This makes it hard for agents to meeting

one another’s expectations in all contexts courtesy of social norm compliance.

The third qualification to the claim that norm compliance makes social environments

transparent is that social norm compliance involves gradability. One gradable feature is the

level of confidence that agents have that a specific social norm applies in a given context. For

example, during a football match in Italy, people are more confident that a social norm

applies that allows abusive chants than that a norm applies against littering. A second

gradable feature is that the social norms with which agents comply are more or less stable in

the face of new information. For example, in the face of incoming information, people’s

confidence that if somebody buys you a round of drinks at the pub, then you ought to buy the

next round may be more stable than their confidence that one ought to orderly queue to get a

drink at a bar in Australia. A third gradable feature is the degree of importance, or value that

agents assign to a social norm in some situation. People, for example, can assign high value

(or high importance) to addressing in a formal way a queen, but they can assign higher value

to leaving a tip to waitresses at restaurants.

Finally, norms often conflict. For instance, traditional family norms often clash with

wider social expectations: agents may regard themselves as having motives to comply with

each of two norms, but complying with both norms is not possible. Thus, in the face of

normative conflict, agents will breach somebody’s expectations no matter what they do,

which will contribute to make social environments opaque.

9

Now, with these qualifications in place, let us specify two of the major problems a

computational system needs to solve in a social environment. Specifying these problems will

help us identify the sort of neurocomputational mechanism that could enable biological,

adaptive agents to acquire and act upon social norms so as to reduce their uncertainty. The

two problems are:

(i) To use sensory information to compute representations of social situations.

(ii) To consume these representations to determine future movements, or internal

changes, in the presence of, and interaction with, other people.

Problems (i) and (ii) are not specific to social cognition. In the domain of social

interaction, however, they seem much harder to tackle, since living with other agents makes

our surroundings more uncertain, complex, noisy, and ambiguous. But if problems (i) and (ii)

are not specific to social cognition, then reliable computational solutions for perception and

action can be extended to the domain of social interaction (cf. Behrens, Hunt and Rushworth

2009; Montague and Lohrenz 2007; Wolpert, Doya and Kawato 2003). My proposal follows

exactly this strategy. Given the relationships between norm compliance, uncertainty, and

prediction-error, my proposal embraces the prediction-error minimization approach, which

has proved fruitful to tackle computational problems with respect to perception, learning, and

action (e.g. Glimcher 2011; Rao et al. 2002).

The types of prediction-errors being minimized to solve those challenges are three:

- sensory input prediction-error,

- reward prediction-error,

- state prediction-error.

The first type of prediction-error enables agents to solve challenge (i); the last two types to

solve challenge (ii).

10

Social Bayesian - RL Brains

I now introduce the main ingredients that enter a Bayesian – RL cooking recipe for social

norm compliance. After these ingredients are defined at an abstract level, the familiar example

of learning to comply with a norm of tipping at a restaurant will concretely illustrate some

core aspects of the proposal. A more detailed discussion of how the Bayesian and RL

components might interact concludes the section.

Social States and Agents’ Hidden States

A social state is a set of social variables in a process that generates sensory input. Variables

are social when they concern features of agents’ interactions. Social states are highly

structured, in that the variables constituting a social state can be correlated in complicated

ways. The most important social feature is the hidden (mental) state of the other agents with

whom we interact. The value of agents’ hidden state both affects and is affected by the social

contexts where the agents interact. Social contexts are sets of slowly and discretely changing

parameters. These parameters comprise both slower changing variables in the internal state of

agents and external variables such as features of the physical configuration of the external

environment. Examples of these features are the physical arrangements of buildings and of

their internal spaces. Churches, universities, cinemas, houses, parks are all examples of social

contexts.

The hidden state of an agent is the most important social feature because it determines

how that agent will interact with us, and how that agent will react to new sensory input. If we

knew other agents’ state, then we would have a model of their behavior. A model of their

behavior would allow us to predict their reactions to inputs that we or the environment

provide to their sensory systems. When other agents also have a model of our behavior, we

11

have a means to adjust our behavior to each other by predicting each other’s reactions to new

inputs (Wolpert et al. 2003).

However, we don’t have direct access to other agents’ state. Our cognitive systems

need to infer it by relying on information about the social context and about other social

variables like facial expression, hand gestures, posture, physical appearance, dress, speech,

tone of voice, and so on. Relying on this type of information is necessary for our

computationally-bounded cognitive system even if we had some direct access to other agents’

internal state. Other agents’ internal state, in fact, partly depends on their prior expectations

about our state. During social interaction their behavior is both affecting and affected by our

state. This would lead to an infinite hierarchy of priors in a computationally-unbounded agent.

We are trying to infer another agent’s state who is trying to infer our state: What I expect

another agent’s state is; what the other agent expects I expect about her state; what I expect

another agent expects me to expect about her state, and so on. If we tried to infer other agents’

states by using only information about mutual expectations about each other’s state, then the

infinity of priors about priors would make the computation of the state of the other agent

unfeasible.

The approaches to this complexity can be twofold. On the one hand, our cognitive

system can be thought as implementing finite rather than infinite prior hierarchies. There is

evidence on strategic thinking in economic games suggesting that in fact people’s hierarchy

of priors about other agents’ state comprises on average 1.5 levels (Camerer et al. 2004). On

the other hand, our cognitive system can constrain inference about other agents’ state by

relying on learned correlations between certain external cues and the types of mental states

normally entertained by agents in circumstances of a certain sort (e.g. “If the environment is

dirty, then people are likely to feel disgust over there”). In this latter case, external social cues

will function as proxies for the other agents’ states.

12

Representations of external social cues need to be extracted from many modalities,

integrated, and, at least initially, combined with our prior expectations about the other agent’s

state. After we acquire familiarity with the structure of the environment and the way in which

such external cues correlate to different mental states of other agents, we need not rely on any

prior expectation about other agents’ priors anymore. The external cues will function as

reliable proxies for knowledge about other agents’ beliefs and motives. By forming accurate

social representations from extensive interaction with certain types of external cues, we can

arrive to reliably represent the hidden state of other agents as though we were trying to

directly infer it. But in this case, our cognitive system does not need, in fact, to make

inferences about the internal states of other agents. Other agents’ hidden states would be

already predicted by the social representation extracted from other relevant external cues.

Two ideas should be distinguished here. One idea is that constructing and using an

inner model of other agents is sometimes less difficult than is generally supposed, courtesy of

hierarchical and/or approximate algorithms. Hierarchical Bayesian, and RL algorithms offer

one way—although certainly not the only way—to deal with domains such as the social one,

that involves a large space of possible states and a large set of possible actions (see e.g.

Botvnick, Niv and Barto 2009; Lee 2011). Furthermore, insofar as Bayesian or RL

computations are intractable, many different approximations—including Monte Carlo and

variational approximations—can replace exact inference in practice to account for the

cognitive phenomena and behavior displayed by boundedly-rational social agents (Gershman

and Daw 2012; Kwisthout and van Rooij 2013; Sanborn, Griffiths and Navarro 2010).4

A different idea is that very often we do not need an inner model to predict human

actions. This idea resonates with a core insight in primatology as well as with some influential

4 In what follows the shorthand ‘Bayesian’ refers to these types of tractable schemes.

13

philosophical work concerning understanding a language. With respect to the latter, Millikan

(2005, Ch. 10), for example, argues that language understanding does not require mentalizing

because it does not require grasping of speakers’ intentions: understanding a language would

be a form of direct perception of the world, instead of the speakers’ intentions and thoughts.

Millikan explains: “interpreting the meaning of what you hear through the medium of speech

sounds that impinge on your ears is much like interpreting the meaning of what you see

through the medium of light patterns that impinge on your eyes” (Millikan 2005, p. 205).

In primatology, a core insight is that similar behavior displayed by different species

can be produced by very different mechanisms. Behavior-reading and mind-reading are two

such mechanisms. While it might seem that complex social cognitive skills always depend on

some understanding of what others believe, want, or know, this is in fact unnecessary. Several

complex social behaviors can depend only on the information provided by overt behavioral

cues of other agents (see e.g. Rosati and Hare 2010 for a concise recent review).

The two ideas just distinguished are not unique to the Bayesian – RL model I shall be

drawing, but they cohere with it. And it is important to bring this fact into clearer focus

because a Bayesian – RL model might appear, at first glance, to offer an implausible account

of evolved, biological, social/moral intelligence. The model is in fact more plausible than it

appears. While there are Bayesian schemes that underwrite the idea that acquiring and using

an inner model of others is less hard than is supposed (e.g. Baker, Saxe ad Tenenbaum 2011;

Hamlin et al. 2013; Yoshida et al. 2010), different types of RL algorithms, which our nervous

systems might implement, suggest that social norm compliance may well be driven by both

behavior- and mind-reading processes (cf. Daw et al. 2005; Dickinson and Balleine 2002).

Bayesian Social Representing

14

The first neurocomputational building block, which can carry out the task of computing social

representations from sensory input, is a hierarchical Bayesian algorithm. According to the

proposal on offer, the cortex learns and infers about the causes of sensory input by

implementing Bayesian inference in a multistage processing hierarchy, which allows to

incorporate statistical dependencies between stimulus representations at different levels of

abstraction (cf. Lee and Mumford 2003; Friston 2008). The lowest level in the cognitive

system would represent basic physical features like displacement, acceleration, mass,

orientation, and wavelength that are combined into increasingly complex representations, up

to higher levels that represent social states. When the value of the prior on state Y depends on

other parameters Z at higher levels, given perceptual input Sx, the resulting posterior

probability is computed with some suitable approximation of:

[1] Prob (Y, Z | Sx) Prob (Sx | Y) Prob(Y | Z) Prob (Z)

The function that our cortex should compute is the posterior probability function Prob

[Z | Sx] of a high-level hidden state Z given sensory input Sx. For example, Sx may be the

sensory input to the nervous system when an agent faces a social environment of a certain

type and Z may be the representation, at the highest level, of the social role of a certain

individual encountered over there (“That person is a waitress”).

In order to carry out this computation, the cortical system reverses a generative (or

forward) model, which describes the causal process that gives rise to data assigning a

probability distribution to each step in the process. Given the generative model used by the

cognitive system to determine how sensory input is generated, the system can infer the hidden

state dependent on the sensory data by reversing the generative model. Lower-level

representations are combined in a Bayesian fashion to compute more and more abstract

representations at higher levels. The feedback, in the form of a sensory input prediction-error

carried by forward connections in these hierarchical Bayesian model, provides a means to

15

incorporate statistical dependencies between representations at different levels of abstractions

(e.g. “If this is a diner in New Jersey and that person is a waitress, then her hourly pay is

likely to be relatively low”). The dependencies between the representations and their weight

in the hierarchical Bayesian model will vary in function of one’s personal learning trajectory.

While we interact with other agents, our nervous system is constantly reorganizing, so that the

models of the social environment it encodes get updated, and can serve as maps we can use to

smoothly navigate the social world. Prediction-error minimization of sensory input can enable

us to acquire social representations. But social representations, by themselves, do not

motivate us to take action.

RL Social Norm Compliance

The second piece of neurocomputational machinery that could explain how social

representations are transformed to enable us to engage in social norm compliance is the

Reinforcement Learning (RL) account of the cortico-basal ganglia circuit (Sutton and Barto

1998; Niv 2009). RL neural computing bootstraps us into social behavior and culture by

transforming social representations so as to determine future movements or internal changes

in the presence of, and interaction with, other people (on the relevance of reinforcement

learning to social behavior, including altruistic behavior, see Lee 2008; Seymour, Yoshida

and Dolan 2009). When RL mechanisms tap into social representations that concern the

hidden state of other people, then they enable us to learn to comply with social norms by

minimizing social reward prediction-error.

Imagine that you arrive in some foreign country. You have certain beliefs (or priors)

about how situations of type Z look like and about how people typically behave in Z: you

have priors concerning a social state. In particular, you have a prior over the hidden state of

other people in Z. Yet you are uncertain about what “grammar” governs situations of type Z in

16

that country, as you have a low degree of confidence about the mapping between sensory

input and social representation of Z in that country, you are uncertain about the state transition

T (z, a, z’), which determines how the environment evolves as you take actions, and you are

uncertain about the reward structure of the environment R: Z x A → , which determines the

patterns of rewards/punishments you will incur by taking certain actions in certain states. If

you want to interact adaptively with other people in that country in the environment Z, then

you must learn and use the “grammar” people live by in Z in that country.

One task that your cognitive system should carry out in order to learn that “grammar”

is to update your prior over the social environment Z in light of the information provided by

the data generated by states in that environment. This task can be carried out courtesy of the

Bayesian strategy sketched above. Suppose, for example, that you arrive to represent Z as a

“diner” with high confidence. Your representation of Z correlates in specific ways to the

hidden states of people who happen to be in Z. So, by relying on this representation, you

expect that people in that environment have certain social roles in Z, which prompt them to

behave in specific ways over there. By relying on your social representation of Z, you also

expect that the environment has a certain causal structure. Because you are not confident

about the state transition function, and about the reward contingencies in Z in that new

country, you have to learn them if you wish to behave co-adaptively.

To learn these pieces of “social grammar” your cognitive system can rely on model-

based and model-free RL algorithms, which have relatively clear neural implementations

(Daw et al. 2005; Dayan 2008). These two types of algorithms differ in how they draw on

experience to estimate quantities relevant to make choices and how they transform these

quantities to reach a decision. Model-based algorithms draw on experience to build a model

of the state transition and reward structure of the environment. These describe how different

states of the environment, with their associated rewards, are connected to each other. Model-

17

based algorithms make choices by searching this model to find the most valuable action. Such

a strategy is time-consuming and computationally costly, though it leads to more accurate

choices.

In contrast, model-free algorithms draw on experience to learn action values directly,

without building and searching any explicit model of the environment. What drives learning

and action selection in model-free algorithms is a reward prediction-error, which reinforces

successful actions without relying on explicit knowledge about state transitions or reward

structure of the environment. This makes model-free computation more tractable, but less

accurate than model-based strategies.

Depending on one’s knowledge of the environment, cognitive resources, and time

constraints, adaptive behavior can be best served by model-based or model-free algorithms.

Given limited experience with that new environment, behavior may reflect model-based

processes (Gläscher et al. 2010). Initially, you rely completely on an estimated model of the

environment of the form:

[2] Prob (new state | state, action).

This estimated model of the environment can be constrained with information about the

structure associated with your social representation of Z. Using this model you can perform a

simulation of the consequences of your actions given current state z: If you take action at from

current state z, then it’s likely that you will end up in state z’. You utilize experience with

state transitions to update an estimated state transition function T (z, a, z’). Upon each of your

choices, a state prediction-error is computed:

[3] δspe = 1 - T (z, a, z’).

This state prediction-error is used to update the probability of the observed transition thus:

[4] T (z, a, z’) = T (z, a, z’) + ηδspe

where η is a learning rate parameter.

18

Behavior shaped by model-based computation reflects a goal-directed process in

which a particular desired outcome, like having a good meal and avoiding frictions with other

people in Z, is used to flexibly determine any complex sequence of actions needed to achieve

it. Action selection is carried out by searching your model of the environment: you work out

the consequences of each action available to you in z, and select the action that is more likely

to lead you towards your desired outcome. This allows action selection to be sensitive to

changes in the structure of the environment and in your motivational state. If, for example,

you notice that people suddenly react differently than usual given z, or your motivational state

is abnormal, you can immediately adjust your behavior accordingly.

Searching and updating your “map” of the environment is computationally demanding

both for working memory and for your “mentalizing competence.” You need to remember

situations you encountered in the past similar to the one at hand, you need to work out what

other people’s expectations may be, you have to consider many different actions and

outcomes, and work out which is the best to achieve your goals. This can reduce the capacity

for alternative computations and the smoothness of interaction, as the model-based

computation would engage valuable cognitive resources to identify which action you should

implement given your state and your goals. By relying on a model-based controller, learning

and complying with social norms can be effortful and time consuming.

One crucial aspect of social-decision problems is that they typically recur. So, with

more experience with situation Z in that country, you need not to rely on the model-based

strategy. After you have regularly encountered situations of type Z, the sensory data generated

by Z have led your representation of Z to be more and more accurate. Your prior about the

structure of that environment can impose further constraints on the state and action space, on

which your learning and decision-making systems tap. You may rely on model-free

19

computation which drives learning and decision making by means of social reward

prediction-errors.

A reward prediction-error is a difference between two values associated with

executing actions in some state. The value of a state is the expected sum of future rewards and

punishments that can be achieved starting to act from that state. In general, rewards can be

understood as stimuli, objects or states that make us come back for more. Punishments,

conversely, can be understood as stimuli, objects or states that make us not come back for

more (Schultz 2007). Although the distinction between social and non-social rewards is not a

sharp one, a social reward can be defined as a stimulus provided by another individual of the

same or some other closely related species by means of some movement, sound, utterance,

gesture, posture, or facial expression. Consistently with this definition, examples of non-

social rewards can be money, food, water, and a variety of other inanimate objects and signs.

By picking up on social rewards, you acquire ways of evaluating or predicting the

long-term consequences associated with executing a particular action in a social context. You

need not mentalize with others or search any map of the environment in order to comply with

a norm. You can come to comply with social norms automatically, quickly and at little

computational costs. It is important to emphasize, however, that the shift from model-based to

model-free control is not sequential nor instantaneous, but highly parallel and dynamic. The

early phase of model-free learning processes take place while behavior still appear to be

controlled by a model-based strategy (Tricomi et al. 2009). Moreover, model-based and

model-free computations in the human brain may not be neatly separated (Daw et al. 2011),

which suggests significant cross-talk and interaction between different types of RL algorithms

that might be implemented by neural activity.

Nonetheless, model-free algorithms seem to operate most effectively, with little

computational demands, in familiar situations. They operate on “cached” values, which store

20

experience about the overall future worth of a particular action. Such values can be used to

implement certain behavioral responses in the face of stimuli that were consistently associated

to a rewarding outcome in the past. Given reliable co-variation between situational cues and

certain behavioral patterns of people in Z, the reward values of the behavioral responses

become conditioned onto the cues. Features of the environment become to encode

information about the reward structure of the environment, and you can outsource behavioral

control on them. The cues present in the environment signal opportunities to perform

particular “rewarding” actions. In this way, as your training with social situation Z proceeds,

goal-directed behavior becomes habitual and cue-driven. The representation of Z itself can

drive behavior with no need to work out what other people expect you to do in Z or to keep

track of state transitions underlying Z. Features of Z, that is, acquire the capacity to motivate

you to directly act upon your social representation of Z.

Norm compliance in this case becomes a perceptually-based, habit. And social

interaction becomes a fluid, context-specific, inferential response to incoming sensory inputs

and their values. It enables co-adaptive, smooth interaction without access to hidden states of

other agents in the world. Insofar as other accounts of social behavior entail that the

preference to comply with social norms is always dependent on mutual expectations, then this

theoretical proposal makes a novel, testable prediction: norm compliance can become a habit

that involves no mentalizing.

A Social Game of Tipping

How model-based and model-free RL algorithms can solve the problem of learning a social

norm is best illustrated by examining a specific case. Imagine once again that you are visiting

for a certain number of days a foreign country, about which you know little. Let us assume

that, courtesy of Bayesian neural computing, you represent a given social environment in that

21

country as a restaurant, and that you intend to repeatedly dine at that restaurant. There may be

some social norm of tipping in restaurants in that country, but you are not sure. You want to

learn this social norm so as to display adaptive behavior in the social environment you are

navigating.

Now, the structure of a learning problem such as this one can be simplified thus.5 Each

time you enter the restaurant (call this, state s1), you must choose either of two tables L or R.

While Miss L waits table L generally providing either excellent (s2) or good service (s3),

table R is waited by Miss R, who typically provides average (s4) or bad service (s5). After

your dinner is over, you must pay your bill and decide which amount to tip. After your

decision, the person who waited your table collects your tip and says goodbye to you. The

goodbye can be uttered with either an angry or a cheerful tone of voice, accompanied

respectively by either an angry or a happy facial expression. Such emotional reactions can be

understood as positive or negative social reward outcomes, which you can use to learn how

much is the social norm of tipping over there.

So, starting from state s1 you can take either of two actions (i.e. L or R). If you take

action L, then with a certain probability you will be in state s2, or s3. If you take action R,

then with a certain probability you will be in state s4, or s5. At this point, you have several

different actions available corresponding to different amounts you may leave as a tip—money

is limited and important to you, so the number of actions you are willing to take is bounded.

Finally, a reward outcome is revealed to you (i.e. a positive or negative social reaction),

5 An experimental task of this type has been used by Colombo, Stankevicius and Seriès (ms)

to address how social rewards (e.g. facial expressions), in comparison to non-social rewards

(e.g. conventional feedback marks such as ticks and crosses), impact learning performance

and decision-making.

22

which may depend stochastically on the underlying social norm you are learning and on the

pair (service quality you received- amount you tipped).

Model-based strategies enable agents to learn the social norm of tipping by building

and relying on an estimated map of the environment (i.e. a state–action–outcome tree). State

prediction errors will help acquire and update such a map. Decisions are then made on the

basis of this map, by searching through it and finding the path with the highest overall value.

For example, after some experience, you may learn that the path [enter the restaurant-choose

table L-receive good service-tip 15% of the bill] has highest overall value, and is conducive to

adaptive social behavior.

Which path has the highest overall value for you is determined by the amount of

money you have in your pocket, and how much you care about your money and about the

emotional reaction you receive. All these factors have some impact on how quickly (i.e. after

how many meals in that restaurant) you will behave adaptively in that environment, thereby

learning the social norm of tipping. The way state prediction errors are computed “on the fly”

allows you to be sensitive to changed circumstances; it allows, for example, that your own

current motivational state (e.g. you are especially cheerful today) may lead you to re-valuate

different decision paths.

Model-based strategies enable agents to learn the social norm of tipping directly, by

trial-and-error, without acquiring an explicit map of the environment. These strategies rely

“cached,” learned, values for every action available to you at every state. After several meals

at that restaurant, each (state-action) pair is associated with a value summarising the expected

amount of social rewards that you are likely to obtain by taking a certain action in a given

state. Action selection simply involves choosing the action with the highest cached value at

the current state; for example, given normal service, the action with the highest value is

tipping 10% of the bill. Relying on cached values is computationally simple, as it does not

23

require an explicit estimate of the probabilities that govern state transitions or a search of all

possible paths in the environment. This, however, comes at the cost of less flexibility. Values

do not immediately change with changes in, for example, your motivational state: your being

especially grumpy today will not make any difference.

Finally, besides these two learning strategies, verbal instruction is obviously an

incredibly efficient means to learn how to navigate the social environment. Recent

computational and neuroimaging work indicates that verbal information can have significant

impact on reward-based learning (e.g. Doll et al. 2009; Li et al. 2011). Although it remains

unclear how exactly, and under which circumstances, verbal instructions influence learning

and social behavior, we may assign less weight to observed feedback when reliable verbal

instructions are available, which can spare us multiple errors, and learn more quickly.6

Interaction between Bayesian and RL components

The full neurocomputational account of social norm compliance is not so simple as the

proposal thus far may have suggested. Bayesian and RL components have been treated

separately, as if there is no rich dynamical interaction between Bayesian and RL-systems.

More plausibly, Bayesian and RL processing are intimately related. Evidence indicates that

6 Doll et al. (2009) have developed two neurocomputational models that could explain the

precise effect of verbal information on reward learning: an ‘override’ and a ‘bias model’. In

the first, the striatum—a subcortical brain region and major target of dopaminergic neurons—

learns cue-reward probabilities as experienced, but is overridden by the prefrontal cortex—

where instructed information would be encoded—at the level of the decision output. In the

bias model action selection and learning supported by the striatum are biased by rules and

instructions encoded in the prefrontal cortex.

24

these two kinds of systems display dynamical interaction, but also that a separation between

probabilistic inference and value might not be empirically adequate (Gershman and Daw

2012, Sec. 3.3; see also Sec. 5 for two types of proposals on how the segregation between

perception and action might be weakened or even abandoned).

Now I outline one way in which Bayesian and RL systems might interact in producing

social norm compliance. The basic hypotheses are twofold: On the one hand, RL systems

piggyback on top of the Bayesian system. On the other hand, reward modulates Bayesian

inference at all levels of sensory processing. Let me explain.

What could it mean that RL systems piggyback on Bayesian inference? The idea is not

only that the Bayesian system computes social representations that feed into RL processing,

but Bayesian computing is also intimately involved in RL model construction and action

selection. Specifically, learning a social norm and norm compliance might be performed by

computing (and continuously updating) a posterior distribution over RL state transitions

models, reward probabilities, and value functions, based on sensory input and the history of

past state-action pairs. Algorithms that maintain a distribution over state transitions and the

reward structure of the environment are model-based Bayesian RL algorithms. Algorithms

that maintain a distribution over mappings from state to actions, or over state-values are

model-free Bayesian RL algorithms that do not store any map of the environment. Courtesy

of these two types of algorithms, uncertainty over the ingredients of RL computing are fully

captured, which could facilitate agents to make more informed decisions, while learning

social norms more quickly.

Consider model-based Bayesian RL. Agents start with a prior distribution over

different state transition models (i.e. structures) of the environment, T (z, a, z’). Examples of

different state transition models of a social environment like the one described in the previous

section are: restaurant (s1)-take table on the left (aL)-excellent service (s2), and restaurant

25

(s1)-take table on the left (aL)-bad service (s5). The prior represents the initial belief of the

learner about the structure of the social environment (Gershman and Niv 2010; see also

Tenenbaum et al. 2011 on Bayesian inference over structures). Agents update this belief by

implementing Bayesian inference based on two signals: sensory input, and observed state-

action-state’ triples. Beliefs about states of the world are updated given sensory input. Beliefs

about state transitions of an environment are updated given state-action-state’ sequences.

What could it mean that reward modulates Bayesian inference at all levels of sensory

processing? The idea is that representations of social states are always imbued with reward

value, which is supported by evidence on how the acquisition and representation of incoming

sensory information in the human visual cortex is influenced by e.g. the reward history of a

state (e.g. Serences 2008). This reward-modulation of sensory processing can dramatically

constrain the dimensionality of the space of relevant hypotheses about states and structures of

the environment. It is unnecessarily complex and costly if our neurocomputational systems

stored and computed a lot of information of little relevance to the agent. So, Bayesian and RL

systems would learn and make inference only over reward-relevant representations,

representations relevant to adaptive social interaction.

A study investigating the responses of auditory neurons in grasshoppers underwrites

this conclusion (Machens et al. 2005). It was found that primary auditory receptors in

grasshoppers are not equally sensitive to different auditory stimuli equally frequent in

grasshoppers’ natural environment. These neurons seem to maximize the information gained

about specific, but much less frequent stimuli, namely mating signals stimuli. Hence, the

processes carried out by sensory neurons might not be always matched to the statistics of the

environment. More plausibly, these processes might be tuned to a “weighted ensemble of

natural stimuli, where the different behavioral relevance [i.e. reward value] of stimuli

determines their relative weight in the ensemble” (Ibid., p. 454). The different social

26

relevance of different stimuli—determined by reward-based RL computing—appears to

constrain the workings of Bayesian inference. So, our perception of, and expectations about,

the social world are channelled by our conative processes or states about it, which are molded,

in turn, by the inferences we make based on sensory input.

Conclusions

Our acquisition of the grammar that governs social situations can be driven by minimization

of three types of prediction-errors computed by our nervous system. First, a sensory

prediction-error that is produced and minimized by Bayesian algorithms, which give rise to

social representations, running on hierarchically organized cerebral cortex. Second, a state

prediction-error that is produced and minimized by model-based RL algorithms. Third, a

social reward prediction-error that is produced and minimized by model-free RL algorithms

running on midbrain, dopamine-based circuits. These two RL strategies enable us to act on

our social representations so that we comply with social norms. By working in concert, such

Bayesian-RL neurocomputational system ensures that our predictions about people’s behavior

become self-fulfilling prophecies. Our complying with norms is one trick to make these

predictions come true in social environments. It ensures that our prior expectations about

social sensory input are met and social uncertainty avoided.

Acknowledgements

References

Baker CL., Saxe RR, Tenenbaum, JB (2011) Bayesian theory of mind: Modeling joint belief–

desire attribution. Proceedings of the Thirty-Third Annual Conference of the

Cognitive Science Society, pp 2469–2474

27

Behrens TE, Hunt LT, Rushworth MF (2009) The computation of social behavior. Science

324:1160-1164

Berridge KC (2007) The debate over dopamine’s role in reward: the case for incentive

salience. Psychopharmacology 191:391-431

Bicchieri C (2006) The Grammar of Society: The Nature and Dynamics of Social Norms.

Cambridge University Press, New York

Binmore K (1994). Game Theory and the Social Contract, Vol. I. Playing Fair. MIT Press,

Cambridge MA

Botvnick MM, Niv Y, Barto A (2009) Hierarchically organized behavior and its neural

foundations: A reinforcement-learning perspective. Cognition 113:262-280

Boyd R, Richerson P (2001). Norms and bounded rationality. In Bounded rationality: the

adaptive toolbox Gigerenzer G, Selten R 2001pp. 281–296. Eds. Cambridge,

MA:MIT Press.

Bowers JS, Davis, CJ (2012). Bayesian just-so stories in psychology and neuroscience.

Psychological Bulletin, 138, 389-414.

Clark A (1997) Being There. Cambridge, MA: MIT Press.

Clark A (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive

science. Behav Brain Sci, 36, 181–253

Camerer, C., Teck Hua Ho, T., and Chong, K. (2004). A Cognitive Hierarchy Model of One-

Shot Games. Quarterly Journal of Economics, 119, 861-898.

Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on

humans' choices and striatal prediction errors.” Neuron, 69, 1204–1215

Daw, N.D., Niv, Y., and Dayan, P. (2005). “Uncertainty-based competition between

prefrontal and dorsolateral striatal systems for behavioral control.” Nature

Neuroscience, 8, 1704–1711.

28

Dayan P (2008). The role of value systems in decision making. In Engel C & Singer W,

editors, Better than Conscious? Decision Making, the Human Mind, and Implications

for Institutions Frankfurt, Germany: MIT Press, 51-70.

Dickinson A, Balleine BW (2002). The role of learning in the operation of motivational

systems. In CR Gallistel (Ed.), Learning, motivation and emotion Vol. 3. New York:

John Wiley & Sons, 497-533.

Doll BB, Jacobs WJ, Sanfey AG, Frank MJ (2009) Instructional control of reinforcement

learning: a behavioral and neurocomputational investigation. Brain Res 1299:74–94.

Douglas M (1986). How Institutions Think. New York: Syracuse University Press.

Elster, J. (1989). “Social norms and economic theory.” Journal of Economic Perspectives, 3,

99‐117.

Friston, K. (2010). “The free-energy principle: a unified brain theory?” Nature Review

Neuroscience, 11, 127-138.

Friston, K. (2008). “Hierarchical models in the brain.” PLOS Computational Biology, 4,

e1000211.

Gershman, S.J., and Daw, N.D. (2012). “Perception, action and utility: the tangled skein.” In

M. Rabinovich, K. Friston, and P. Varona (Eds.), Principles of Brain Dynamics:

Global State Interactions, pp. 293-312. Cambridge, MA: MIT Press.

Gershman SJ, and Niv Y (2010) Learning latent structure: Carving nature at its joints -

Current Opinion in Neurobiology 20(2), 251-256.

Gintis, H. (2010). “Social norms as choreography.” Politics, Philosophy and Economics, 9,

251-264.

Gläscher, J., Daw, N., Dayan, P., and O’Doherty, J.P. (2010). “States versus rewards:

Dissociable neural prediction error signals underlying model-based and model-free

reinforcement learning.” Neuron, 66, 585–595.

29

Glimcher, P.W. (2011). “Understanding dopamine and reinforcement learning: The dopamine

reward prediction error hypothesis.” Proceeding of the National Academy of Science

USA, 108, 15647–15654.

Hamlin, J.K, Ullman, T.D., Tenenbaum, J.B., Goodman, N.D., & Baker C.L. (2013). “The

mentalistic basis of core social cognition: Experiments in preverbal infants and a

computational model.” Developmental Science, 16:2, 209-226.

Kwisthout, J. and van Rooij, I. (2013). “Bridging the gap between theory and practice of

approximate Bayesian inference.” Cognitive Systems Research, 24, 2-8.

Lee D. (2008). “Game theory and neural basis of social decision making.” Nature

Neuroscience, 11, 404–409.

Lee, M.D. (2011). “How cognitive modeling can benefit from hierarchical Bayesian models.”

Journal of Mathematical Psychology, 55, 1-7.

Lee TS, Mumford, D (2003). Hierarchical Bayesian inference in the visual cortex. Journal of

the Optical Society of America, A, 20, 1434-1448.

Li J, Delgado MR, Phelps EA (2011) How instructed knowledge modulates the neural

systems of reward learning. Proc Natl Acad Sci USA 108:55–60

Lewis, D.K. (1969). Convention: A Philosophical Study. Cambridge MA: Harvard University

Press.

Millikan, RG (2005). Language: A Biological Model. New York: Oxford University Press.

Machens C, Gollisch T, Kolesnikova, O, and Herz, A. (2005). Testing the efficiency of

sensory coding with optimal stimulus ensembles. Neuron, 47(3):447-456.

Montague PR, Lohrenz T (2007) To detect and correct: norm violations and their

enforcement. Neuron 56:14–18.

Niv, Y. (2009). Reinforcement Learning in the brain. Journal of Mathematical Psychology,

53, 139–154.

30

Niv, Y., and Schoenbaum, G. (2008). “Dialogues on prediction errors.” Trends in Cognitive

Science, 12, 265–272.

Pettit, P. (1990). Virtus Normativa: Rational Choice Perspectives, Ethics 100, 725-55.

Rao, R.P.N., Olshausen, B. and Lewicki M. (eds.) (2002) Probabilistic Models of the Brain:

Perception and Neural Function. Cambridge, MA: MIT Press.

Rosati, A.G. & Hare, B. (2010). “Social cognition: From behavior-reading to mind-reading.”

In: The Encyclopedia of Behavioral Neuroscience, G. Koob, R. F. Thompson, and M.

Le Moal (Eds.). Elsevier pp. 263-268.

Ross, D. (2005). Economic Theory and Cognitive Science: Microexplanation.

Cambridge, MA: MIT Press.

Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (2010). “Rational approximations to rational

models: Alternative algorithms for category learning.” Psychological Review, 117(4),

1144–1167.

Schotter, A. (1981). The economic theory of social institutions. Cambridge, MA: Cambridge

University Press.

Schultz, W. (2007b). “Reward.” Scholarpedia, 2(3):1652.

URL = <http://www.scholarpedia.org/article/Reward>.

Serences, J. (2008). Value-based modulations in human visual cortex. Neuron, 60(6):1169-

1181.

Seymour B, Yoshida W, Dolan R (2009) Altruistic learning. Frontiers in Behavioral

Neuroscience 3:23. doi: 10.3389/neuro.08.023.2009

Smith V (2007) Rationality in Economics: Constructivist and Ecological Forms. Cambridge

University Press, New York.

31

Sripada CS, Stich S (2007) A framework for the psychology of moral norms. In: Carruthers P,

Laurence S, Stich S (eds) Innateness and the structure of the mind, vol II. Oxford

University Press, London, pp 280–302

Sterelny K (2003). Thought in a Hostile World: The Evolution of Human Cognition. Oxford:

Blackwell.

Sugden R. (1986) The Economics of Rights, Cooperation and Welfare. Oxford: Blackwell.

Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction. Cambridge,

MA: MIT Press.

Tenenbaum JB, Kemp, C, Griffiths TL, and Goodman ND (2011). How to grow a mind:

statistics, structure and abstraction. Science, 331, 1279-1285

Tricomi E, Balleine B, O’Doherty J (2009) A specific role for posterior dorsolateral striatum

in human habit learning. Eur J Neurosci 29:2225–2232

Ullmann‐Margalit E (1977) The Emergence of Norms. Oxford University Press, Oxford

Wolpert DM, Doya K, Kawato M (2003) A unifying computational framework for motor

control and social interaction. Philosophical Transactions of the Royal Society of

London B Biological Sciences, 358:593-602

Yoshida W, Seymour B, Friston KJ, Dolan RJ (2010) Neural mechanisms of belief inference

during cooperative games. J Neurosci 30:10744-10751

two neurocomputational building blocks of social norm...

Documents