rethinking reinforcement: allocation, induction, and contingency

24
RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY WILLIAM M. BAUM UNIVERSITY OF CALIFORNIA DAVIS The concept of reinforcement is at least incomplete and almost certainly incorrect. An alternative way of organizing our understanding of behavior may be built around three concepts: allocation, induction, and correlation. Allocation is the measure of behavior and captures the centrality of choice: All behavior entails choice and consists of choice. Allocation changes as a result of induction and correlation. The term induction covers phenomena such as adjunctive, interim, and terminal behavior—behavior induced in a situation by occurrence of food or another Phylogenetically Important Event (PIE) in that situation. Induction resembles stimulus control in that no one-to-one relation exists between induced behavior and the inducing event. If one allowed that some stimulus control were the result of phylogeny, then induction and stimulus control would be identical, and a PIE would resemble a discriminative stimulus. Much evidence supports the idea that a PIE induces all PIE-related activities. Research also supports the idea that stimuli correlated with PIEs become PIE-related conditional inducers. Contingencies create correlations between ‘‘operant’’ activity (e.g., lever pressing) and PIEs (e.g., food). Once an activity has become PIE-related, the PIE induces it along with other PIE-related activities. Contingencies also constrain possible performances. These constraints specify feedback functions, which explain phenomena such as the higher response rates on ratio schedules in comparison with interval schedules. Allocations that include a lot of operant activity are ‘‘selected’’ only in the sense that they generate more frequent occurrence of the PIE within the constraints of the situation; contingency and induction do the ‘‘selecting.’’ Key words: reinforcement, contingency, correlation, induction, inducing stimulus, inducer, alloca- tion, Phylogenetically Important Event, reinstatement _______________________________________________________________________________ This article aims to lay out a conceptual frame- work for understanding behavior in relation to environment. I will not attempt to explain every phenomenon known to behavior analysts; that would be impossible. Instead, I will offer this framework as a way to think about those phenomena. It is meant to replace the tradi- tional framework, over 100 years old, in which reinforcers are supposed to strengthen respons- es or stimulus–response connections, and in which classical conditioning and operant con- ditioning are considered two distinct processes. Hopefully, knowledgeable readers will find nothing new herein, because the pieces of this conceptual framework were all extant, and I had only to assemble them into a whole. It draws on three concepts: (1) allocation, which is the measure of behavior; (2) induction, which is the process that drives behavior; and (3) contin- gency, which is the relation that constrains and connects behavioral and environmental events. Since none of these exists at a moment of time, they necessarily imply a molar view of behavior. As I explained in earlier papers (Baum, 2001; 2002; 2004), the molar and molecular views are not competing theories, they are different paradigms (Kuhn, 1970). The decision between them is not made on the basis of data, but on the basis of plausibility and elegance. No experimental test can decide between them, because they are incommensurable—that is, they differ ontologically. The molecular view is about discrete responses, discrete stimuli, and contiguity between those events. It offers those concepts for theory construction. It was de- signed to explain short-term, abrupt changes in behavior, such as occur in cumulative records (Ferster & Skinner, 1957; Skinner, 1938). It does poorly when applied to temporally ex- tended phenomena, such as choice, because its theories and explanations almost always resort to hypothetical constructs to deal with spans of time, which makes them implausible and inelegant (Baum, 1989). The molar view is about extended activities, extended contexts, and extended relations. It treats short-term effects as less extended, local phenomena (Baum, 2002; 2010; Baum & Davison, 2004). Since behavior, by its very nature, is necessarily Thanks to Howard Rachlin, John Staddon, and Jesse Dallery for helpful comments on earlier versions of this paper. Address correspondence to William M. Baum, 611 Mason #504, San Francisco, CA 94108 (e-mail: wmbaum@ ucdavis.edu; or [email protected]). doi: 10.1901/jeab.2012.97-101 JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2012, 97, 101–124 NUMBER 1(JANUARY) 101

Upload: andre-galhardo

Post on 22-Apr-2015

39 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

WILLIAM M. BAUM

UNIVERSITY OF CALIFORNIA DAVIS

The concept of reinforcement is at least incomplete and almost certainly incorrect. An alternative way oforganizing our understanding of behavior may be built around three concepts: allocation, induction, andcorrelation. Allocation is the measure of behavior and captures the centrality of choice: All behaviorentails choice and consists of choice. Allocation changes as a result of induction and correlation. Theterm induction covers phenomena such as adjunctive, interim, and terminal behavior—behaviorinduced in a situation by occurrence of food or another Phylogenetically Important Event (PIE) in thatsituation. Induction resembles stimulus control in that no one-to-one relation exists between inducedbehavior and the inducing event. If one allowed that some stimulus control were the result ofphylogeny, then induction and stimulus control would be identical, and a PIE would resemble adiscriminative stimulus. Much evidence supports the idea that a PIE induces all PIE-related activities.Research also supports the idea that stimuli correlated with PIEs become PIE-related conditionalinducers. Contingencies create correlations between ‘‘operant’’ activity (e.g., lever pressing) and PIEs(e.g., food). Once an activity has become PIE-related, the PIE induces it along with other PIE-relatedactivities. Contingencies also constrain possible performances. These constraints specify feedbackfunctions, which explain phenomena such as the higher response rates on ratio schedules incomparison with interval schedules. Allocations that include a lot of operant activity are ‘‘selected’’ onlyin the sense that they generate more frequent occurrence of the PIE within the constraints of thesituation; contingency and induction do the ‘‘selecting.’’

Key words: reinforcement, contingency, correlation, induction, inducing stimulus, inducer, alloca-tion, Phylogenetically Important Event, reinstatement

_______________________________________________________________________________

This article aims to lay out a conceptual frame-work for understanding behavior in relation toenvironment. I will not attempt to explain everyphenomenon known to behavior analysts; thatwould be impossible. Instead, I will offer thisframework as a way to think about thosephenomena. It is meant to replace the tradi-tional framework, over 100 years old, in whichreinforcers are supposed to strengthen respons-es or stimulus–response connections, and inwhich classical conditioning and operant con-ditioning are considered two distinct processes.Hopefully, knowledgeable readers will findnothing new herein, because the pieces of thisconceptual framework were all extant, and Ihad only to assemble them into a whole. It drawson three concepts: (1) allocation, which is themeasure of behavior; (2) induction, which is theprocess that drives behavior; and (3) contin-gency, which is the relation that constrains andconnects behavioral and environmental events.

Since none of these exists at a moment of time,they necessarily imply a molar view of behavior.

As I explained in earlier papers (Baum, 2001;2002; 2004), the molar and molecular viewsare not competing theories, they are differentparadigms (Kuhn, 1970). The decision betweenthem is not made on the basis of data, but onthe basis of plausibility and elegance. Noexperimental test can decide between them,because they are incommensurable—that is,they differ ontologically. The molecular view isabout discrete responses, discrete stimuli, andcontiguity between those events. It offers thoseconcepts for theory construction. It was de-signed to explain short-term, abrupt changes inbehavior, such as occur in cumulative records(Ferster & Skinner, 1957; Skinner, 1938). Itdoes poorly when applied to temporally ex-tended phenomena, such as choice, because itstheories and explanations almost always resortto hypothetical constructs to deal with spansof time, which makes them implausible andinelegant (Baum, 1989). The molar view isabout extended activities, extended contexts,and extended relations. It treats short-termeffects as less extended, local phenomena(Baum, 2002; 2010; Baum & Davison, 2004).Since behavior, by its very nature, is necessarily

Thanks to Howard Rachlin, John Staddon, and JesseDallery for helpful comments on earlier versions of thispaper.

Address correspondence to William M. Baum, 611Mason #504, San Francisco, CA 94108 (e-mail: [email protected]; or [email protected]).

doi: 10.1901/jeab.2012.97-101

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2012, 97, 101–124 NUMBER 1 (JANUARY)

101

Page 2: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

extended in time (Baum, 2004), the theoriesand explanations constructed in the molar viewtend to be simple and straightforward. Anyspecific molar or molecular theory may beinvalidated by experimental test, but no oneshould think that the paradigm is thereby in-validated; a new theory may always be inventedwithin the paradigm. The molar paradigm sur-passes the molecular paradigm by producingtheories and explanations that are more plau-sible and elegant.

The Law of Effect

E. L. Thorndike (2000/1911), when propos-ing the law of effect, wrote: ‘‘Of several re-sponses made to the same situation, thosewhich are accompanied or closely followed bysatisfaction to the animal will, other thingsbeing equal, be more firmly connected withthe situation, so that, when it recurs, they willbe more likely to recur…’’ (p. 244).

Early on, it was subject to criticism. J. B.Watson (1930), ridiculing Thorndike’s theory,wrote: ‘‘Most of the psychologists… believehabit formation is implanted by kind fairies.For example, Thorndike speaks of pleasurestamping in the successful movement and dis-pleasure stamping out the unsuccessful move-ments’’ (p. 206).

Thus, as early as 1930, Watson was skepticalabout the idea that satisfying consequencescould strengthen responses. Still, the theoryhas persisted despite occasional criticism (e.g.,Baum, 1973; Staddon, 1973).

Thorndike’s theory became the basis for B. F.Skinner’s theory of reinforcement. Skinner drewa distinction, however, between observation andtheory. In his paper ‘‘Are theories of learningnecessary?’’ Skinner (1950) wrote: ‘‘… the Lawof Effect is no theory. It simply specifies aprocedure for altering the probability of achosen response.’’

Thus, Skinner maintained that the Law ofEffect referred to the observation that when,for example, food is made contingent onpressing a lever, lever pressing increases infrequency. Yet, like Thorndike, he had atheory as to how this came about. In his 1948paper, ‘‘‘Superstition’ in the pigeon,’’ Skinnerrestated Thorndike’s theory in a new vocabu-lary: ‘‘To say that a reinforcement is contin-gent upon a response may mean nothing morethan that it follows the response … condition-ing takes place presumably because of the

temporal relation only, expressed in terms ofthe order and proximity of response andreinforcement.’’

If we substituted ‘‘satisfaction’’ for ‘‘reinfor-cement’’ and ‘‘accompanied or closely fol-lowed’’ for ‘‘order and proximity,’’ we wouldbe back to Thorndike’s idea above.

Skinner persisted in his view that order andproximity between response and reinforcerwere the basis for reinforcement; it reap-peared in Science and Human Behavior (Skinner,1953): ‘‘So far as the organism is concerned,the only important property of the contingen-cy is temporal. The reinforcer simply followsthe response… We must assume that thepresentation of a reinforcer always reinforcessomething, since it necessarily coincides withsome behavior’’ (p. 85).

From the perspective of the present, weknow that Skinner’s observation about contin-gency was correct. His theory of order andproximity, however, was incorrect, because a‘‘reinforcer’’ doesn’t ‘‘reinforce’’ whatever itcoincides with. That doesn’t happen in every-day life; if I happen to be watching televisionwhen a pizza delivery arrives, will I be inclinedto watch television more? (Although, as weshall see, the pizza delivery matters, in thatit adds value to, say, watching football ontelevision.) It doesn’t happen in the laborato-ry, either, as Staddon (1977), among others,have shown.

When Staddon and Simmelhag (1971) re-peated Skinner’s ‘‘superstition’’ experiment,they observed that many different activitiesoccurred during the interval between foodpresentations. Significantly, the activities oftenwere not closely followed by food; someemerged after the food and were gone by thetime the food occurred. Figure 1 shows somedata reported by Staddon (1977). The graphon the left shows the effects of presenting foodto a hungry pigeon every 12 s. The activities ofapproaching the window wall of the chamberand wing flapping rose and then disappeared,undermining any notion that they were acci-dentally reinforced. The graph on the rightshows the effects of presenting food to ahungry rat every 30 s. Early in the interval,the rat drinks and runs, but these activitiesdisappeared before they could be closelyfollowed by food. Subsequent research hasconfirmed these observations many times over,contradicting the notion of strengthening by

102 WILLIAM M. BAUM

Page 3: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

accidental contiguity (e.g., Palya & Zacny,1980; Reid, Bacha, & Moran, 1993; Roper,1978).

Faced with results like those in Figure 1,someone theorizing within the molecular pa-radigm has at least two options. Firstly, con-tiguity between responses and food might bestretched to suggest that the responses arereinforced weakly at a delay. Secondly, food–response contiguity might be brought to bearby relying on the idea that food might elicitresponses. For example, Killeen (1994) theo-rized that each food delivery elicited arousal,a hypothetical construct. He proposed thatarousal jumped following food and then dis-sipated as time passed without food. Build-upof arousal was then supposed to cause theresponses shown in Figure 1. As we shall seebelow, the molar paradigm avoids the hypo-thetical entity by relying instead on the ex-tended relation of induction.

To some researchers, the inadequacy ofcontiguity-based reinforcement might seemlike old news (Baum, 1973). One response toits inadequacy has been to broaden the con-cept of reinforcement to make it synonymouswith optimization or adaptation (e.g., Rachlin,Battalio, Kagel, & Green, 1981; Rachlin &Burkhard, 1978). This redefinition, however,begs the question of mechanism. As Herrnstein

(1970) wrote, ‘‘The temptation to fall backon common sense and conclude that animalsare adaptive, i.e., doing what profits themmost, had best be resisted, for adaptation is atbest a question, not an answer (p. 243).’’Incorrect though it was, the contiguity-baseddefinition at least attempted to answer thequestion of how approximations to optimalor adaptive behavior might come about.This article offers an answer that avoids thenarrowness of the concepts of order, proxim-ity, and strengthening and allows plausibleaccounts of behavior both in the laboratoryand in everyday life. Because it omits theidea of strengthening, one might doubt thatthis mechanism should be called ‘‘reinforce-ment’’ at all.

Allocation of Time among Activities

We see in Figure 1 that periodic foodchanges the allocation of time among activitieslike wing flapping and pecking or drinkingand running. Some activities increase whilesome activities decrease. No notion of streng-thening need enter the picture. Instead, weneed only notice the changing allocation.

To be alive is to behave, and we may assumethat in an hour’s observation time one ob-serves an hour’s worth of behavior or that in amonth of observation one observes a month’s

Fig. 1. Behavior induced by periodic food. Left: Activities of a pigeon presented with food every 12 s. Right: Activitiesof a rat presented with food every 30 s. Activities that disappeared before food delivery could not be reinforced. Reprintedfrom Staddon (1977).

RETHINKING REINFORCEMENT 103

Page 4: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

worth of behavior. Thus, we may liken alloca-tion of behavior to cutting up a pie. Figure 2illustrates the idea with a hypothetical exam-ple. The chart shows allocation of time in theday of a typical student. About 4 hr are spentattending classes, 4 hr studying, 2.5 hr eatingmeals, 0.5 hr in bathroom activities, 3 hrin recreation, and 2 hr working. The totalamounts to 16 hr; the other 8 hr are spent insleep. Since a day has only 24 hr, if any of theseactivities increases, others must decrease, andif any decreases, others must increase. As weshall see later, the finiteness of time is im-portant to understanding the effects of con-tingencies.

Measuring the time spent in an activity maypresent challenges. For example, when exactlyis a pigeon pecking a key or a rat pressing alever? Attaching a switch to the key or leverallows one to gauge the time spent by thenumber of operations of the switch. Luckily,this method is generally reliable; countingswitch operations and running a clock when-ever the switch is operated give equivalent data(Baum, 1976). Although in the past research-ers often thought they were counting dis-crete responses (pecks and presses), from thepresent viewpoint they were using switchoperations to measure the time spent incontinuous activities (pecking and pressing).

Apart from unsystematic variation that mightoccur across samples, allocation of behaviorchanges systematically as a result of two sources:induction and contingency. As we see inFigure 1, an inducing environmental event likefood increases some activities and, because timeis finite, necessarily decreases others. Forexample, food induces pecking in pigeons,electric shock induces aggression in rats and

monkeys, and removal of food induces aggres-sion in pigeons. Figure 3 shows a hypotheticalexample. Above (at Time 1) is depicted apigeon’s allocation of time among activitiesin an experimental situation that includes amirror and frequent periodic food. The pigeonspends a lot of time eating and resting andmight direct some aggressive or reproductiveactivity toward the mirror. Below (at Time 2) isdepicted the result of increasing the intervalbetween food deliveries. Now the pigeonspends a lot of time aggressing toward themirror, less time eating, less time resting, andlittle time in reproductive activity.

A contingency links an environmental eventto an activity and results in an increase ordecrease in the activity. For example, linkingfood to lever pressing increases the time spentpressing; linking electric shock to lever press-ing usually decreases the time spent pressing.Figure 4 illustrates with a hypothetical exam-ple. The diagram above shows a child’spossible allocation of behavior in a classroom.A lot of time is spent in hitting other childrenand yelling, but little time on task. Time isspent interacting with the teacher, but with no

Fig. 2. A hypothetical illustration of the concept ofallocation. The times spent in various activities in a typicalday of a typical student add up to 16 h, the other 8 beingspent in sleep. If more time is spent in one activity, lesstime must be spent in others, and vice versa.

Fig. 3. A hypothetical example illustrating change ofbehavior due to induction as change of allocation. At Time1, a pigeon’s activities in an experimental space exhibitAllocation 1 in response to periodic food deliveries. AtTime 2, the interval between food deliveries has beenlengthened, resulting in an increase in aggression toward amirror. Other activities decrease necessarily.

104 WILLIAM M. BAUM

Page 5: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

relation to being on task. The diagram belowshows the result after the teacher’s attention ismade contingent on being on task. Now thechild spends a lot of time on task, less timehitting and yelling, and a little more timeinteracting with the teacher—presumably nowin a more friendly way. Because increase inone activity necessitates decrease in otheractivities, a positive contingency between pay-off and one activity implies a negative contin-gency between payoff and other activities—themore yelling and hitting, the less teacherattention. A way to capture both of theseaspects of contingency is to consider the wholeallocation to enter into the contingency withpayoff (Baum, 2002).

Induction

The concept of induction was introduced byEvalyn Segal in 1972. To define it, she reliedon dictionary definitions, ‘‘stimulating theoccurrence of’’ and ‘‘bringing about,’’ andcommented, ‘‘It implies a certain indirectionin causing something to happen, and so seemsapt for talking about operations that may beeffective only in conjunction with other

factors’’ (p. 2). In the molar view, that‘‘indirection’’ is crucial, because the conceptof induction means that, in a given context,the mere occurrence of certain events, such asfood—i.e., inducers—results in occurrence ofor increased time spent in certain activities.All the various activities that have at one timeor another been called ‘‘interim,’’ ‘‘terminal,’’‘‘facultative,’’ or ‘‘adjunctive,’’ like those shownin Figure 1, may be grouped together asinduced activities (Hineline, 1984). They areinduced in any situation that includes eventslike food or electric shock, the same events thathave previously been called ‘‘reinforcers,’’‘‘punishers,’’ ‘‘unconditional stimuli,’’ or ‘‘re-leasers.’’ Induction is distinct from the narrow-er idea of elicitation, which assumes a closetemporal relation between stimulus and re-sponse. One reason for introducing a new,broader term is that inducers need have noclose temporal relation to the activities theyinduce (Figure 1). Their occurrence in a par-ticular environment suffices to induce activities,most of which are clearly related to the in-ducing event in the species’ interactions with itsenvironment—food induces food-related activ-ity (e.g., pecking in pigeons), electric shockinduces pain-related activity (e.g., aggressionand running), and a potential mate inducescourtship and copulation.

Induction and Stimulus Control

Induction resembles Skinner’s concept ofstimulus control. A discriminative stimulus hasno one-to-one temporal relation with respond-ing, but rather increases the rate or time spentin the activity in its presence. In present terms,a discriminative stimulus modulates the allo-cation of activities in its presence; it induces acertain allocation of activities. One may saythat a discriminative stimulus sets the occasionor the context for an operant activity. Similar-ly, an inducer may be said to occasion or createthe context for the induced activity. Stimuluscontrol may seem sometimes to be morecomplicated, say, in a conditional discrimina-tion that requires a certain combination ofevents, but, no matter how complex thediscrimination, the relation to behavior is thesame. Indeed, although we usually conceive ofstimulus control as the outcome of an individ-ual’s life history (ontogeny), if we accept theidea that some instances of stimulus controlmight exist as a result of phylogeny, then

Fig. 4. A hypothetical example illustrating change ofallocation due to correlation. At Time 1, a child’sclassroom activities exhibit Allocation 1, including highfrequencies of disruptive activities. At Time 2, after theteacher’s attention has been made contingent upon beingon task, Allocation 2 includes more time spent on task and,necessarily, less time spent in disruptive activities.

RETHINKING REINFORCEMENT 105

Page 6: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

stimulus control and induction would be twoterms for the same phenomenon: the effect ofcontext on behavioral allocation. In whatfollows, I will assume that induction or sti-mulus control may arise as a result of eitherphylogeny or life history.

Although Pavlov thought in terms of reflex-es and measured only stomach secretion orsalivation, with hindsight we may apply thepresent terms and say that he was studyinginduction (Hineline, 1984). Zener (1937),repeating Pavlov’s procedure with dogs thatwere unrestrained, observed that during a tonepreceding food the dogs would approach thefood bowl, wag their tails, and so on—allinduced behavior. Pavlov’s experiments, inwhich conditional stimuli such as tones andvisual displays were correlated with inducerslike food, electric shock, and acid in themouth, showed that correlating a neutralstimulus with an inducer changes the functionof the stimulus. In present terms, it becomes aconditional inducing stimulus or a conditionalinducer, equivalent to a discriminative stimulus.Similarly, activities linked by contingency toinducers become inducer-related activities. Forexample, Shapiro (1961) found that when adog’s lever pressing produces food, the dogsalivates whenever it presses. We will developthis idea further below, when we discuss theeffects of contingencies.

Phylogenetically Important Events

Since they gain their power to induce as aresult of many generations of natural selec-tion—from phylogeny—I call them Phylogenet-ically Important Events (PIEs; Baum, 2005). APIE is an event that directly affects survival andreproduction. Some PIEs increase the chancesof reproductive success by their presence;others decrease the chances of reproductivesuccess by their presence. On the increaseside, examples are food, shelter, and matingopportunities; on the decrease side, examplesare predators, parasites, and severe weather.Those individuals for whom these events wereunimportant produced fewer surviving off-spring than their competitors, for whom theywere important, and are no longer represent-ed in the population.

PIEs are many and varied, depending on theenvironment in which a species evolved. Inhumans, they include social events like smiles,frowns, and eye contact, all of which affect

fitness because of our long history of living ingroups and the importance of group member-ship on survival and reproduction. PIEsinclude also dangerous and fearsome eventslike injuries, heights, snakes, and rapidlyapproaching large objects (‘‘looming’’).

In accord with their evolutionary origin,PIEs and their effects depend on species.Breland and Breland (1961) reported the in-trusion of induced activities into their attemptsto train various species using contingent food.Pigs would root wooden coins as they wouldroot for food; raccoons would clean coins as ifthey were cleaning food; chickens wouldscratch at the ground in the space where theywere to be fed. When the relevance of phy-logeny was widely acknowledged, papers andbooks began to appear pointing to ‘‘biologicalconstraints’’ (Hinde & Stevenson-Hinde, 1973;Seligman, 1970). In humans, to the reactionsto events like food and injury, we may addinduction of special responses to known orunknown conspecifics, such as smiling, eye-brow raising, and tongue showing (Eibl-Eibesfeldt, 1975).

Accounts of reinforcers and punishers thatmake no reference to evolutionary historyfail to explain why these events are effectiveas they are. For example, Premack (1963;1965) proposed that reinforcement could bethought of as a contingency in which theavailability of a ‘‘high-probability’’ activitydepends on the occurrence of a ‘‘low-proba-bility’’ activity. Timberlake and Allison (1974)elaborated this idea by adding that any activityconstrained to occur below its level in baseline,when it is freely available, is in ‘‘responsedeprivation’’ and, made contingent on anyactivity with a lower level of deprivation, willreinforce that activity. Whatever the validity ofthese generalizations, they beg the question,‘‘Why are the baseline levels of activities likeeating and drinking higher than the baselinelevels of running and lever pressing?’’ Depri-vation and physiology are no answers; thequestion remains, ‘‘Why are organisms soconstituted that when food (or water, shelter,sex, etc.) is deprived, it becomes a potentinducer?’’ Linking the greater importance ofeating and drinking (food and water) tophylogeny explains why these events areimportant in an ultimate sense, and theconcept of induction explains how they areimportant to the flexibility of behavior.

106 WILLIAM M. BAUM

Page 7: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

Figure 5 illustrates that, among all possibleevents—environmental stimuli and activities—symbolized by the large circle, PIEs are a smallsubset, and PIE-related stimuli and activitiesare a larger subset. That the subset of PIE-related stimuli and activities includes thesubset of PIEs indicates that PIEs usually arethemselves PIE-related, because natural envi-ronments and laboratory situations arrangethat the occurrence of a PIE like food predictshow and when more food might be available.Food usually predicts more food, and thepresence of a predator usually predicts con-tinuing danger.

A key concept about induction is that a PIEinduces any PIE-related activity. If a PIE likefood is made contingent on an activity likelever pressing, the activity (lever pressing)becomes PIE-related (food-related). The ex-perimental basis for this concept may be foundin experiments that show PIEs to function asdiscriminative stimuli. Many such experimentsexist, and I will illustrate with three examplesshowing food contingent on an activity induc-es that activity.

Bullock and Smith (1953) trained rats for 10sessions. Each session consisted of 40 foodpellets, each produced by a lever press (i.e.,Fixed Ratio 1), followed by 1 hr of extinction(lever presses ineffective). Figure 6 shows theaverage number of presses during the hour of

extinction across the 10 sessions. The decreaseresembles the decrease in ‘‘errors’’ that occursduring the formation of a discrimination. That ishow Bullock and Smith interpreted their results:The positive stimulus (SD) was press-plus-pellet,and the negative stimulus (SD) was press-plus-no-pellet. The food was the critical elementcontrolling responding. Thus, the results sup-port the idea that food functions as a discrim-inative stimulus.

The second illustrative finding, reported byReid (1958), is referred to as ‘‘reinstatement,’’and has been studied extensively (e.g., Ost-lund & Balleine, 2007). Figure 7 shows theresult in a schematic form. Reid studied rats,pigeons, and students. First he trained re-sponding by following each response with food(rats and pigeons) or a token (students). Oncethe responding was well-established, he sub-jected it on Day 1 to 30 min of extinction. Thiswas repeated on Day 2 and Day 3. Then, onDay 3, when responding had disappeared, hedelivered a bit of food (rats and pigeons) or atoken (students). The free delivery was imme-diately followed by a burst of responding(circled in Figure 7). Thus, the freely deliv-ered food or token reinstated the responding,functioning just like a discriminative stimulus.

As a third illustration, we have the datathat Skinner himself presented in his 1948‘‘superstition’’ paper. Figure 8 reproduces the

Fig. 5. Diagram illustrating that Phylogenetically Im-portant Events (PIEs) constitute a subset of PIE-relatedevents, which in turn are a subset of all events.

Fig. 6. Results of an experiment in which food itselfserved as a discriminative stimulus. The number ofresponses made during an hour of extinction following40 response-produced food deliveries decreased across the10 sessions of the experiment. These responses would beanalogous to ‘‘errors.’’ The data were published in a paperby Bullock and Smith (1953).

RETHINKING REINFORCEMENT 107

Page 8: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

cumulative record that Skinner claimed to show‘‘reconditioning’’ of a response of hopping fromside to side by adventitious reinforcement. Look-ing closely at the first two occurrences of food(circled arrows), however, we see that noresponse occurred just prior to the first occur-rence and possibly just prior to the secondoccurrence. The hopping followed the food;it did not precede it. Skinner’s result showsreinstatement, not reinforcement. In otherwords, the record shows induction by periodicfood similar to that shown in Figure 1.

All three examples show that food inducesthe ‘‘operant’’ or ‘‘instrumental’’ activity thatis correlated with it. The PIE induces PIE-related activity, and these examples show thatPIE-related activity includes operant activitythat has been related to the PIE as a result ofcontingency. Contrary to the common view ofreinforcers—that their primary function is tostrengthen the response on which they arecontingent, but that they have a secondary roleas discriminative stimuli—the present viewasserts that the stimulus function is all thereis. No notion of ‘‘strengthening’’ or ‘‘rein-forcement’’ enters the account.

One way to understand the inducing powerof a PIE like food is to recognize that in natureand in the laboratory food is usually a signalthat more food is forthcoming in that envi-ronment for that activity. A pigeon foragingfor grass seeds, on finding a seed, is likely to

find more seeds if it continues searching inthat area (‘‘focal search;’’ Timberlake & Lucas,1989). A person searching on the internet fora date, on finding a hopeful prospect on a website is likely to find more such if he or shecontinues searching that site. For this reason,Michael Davison and I suggested that themetaphor of strengthening might be replacedby the metaphor of guidance (Davison &Baum, 2006).

An article by Gardner and Gardner (1988)argued vigorously in favor of a larger role forinduction in explaining the origins of behav-ior, even suggesting that induction is theprincipal determinant of behavior. They drewthis conclusion from studies of infant chim-panzees raised by humans. Induction seemedto account for a lot of the chimpanzees’behavior. But induction alone cannot accountfor selection among activities, because we mustalso account for the occurrence of the PIEs

Fig. 7. Cartoon of Reid’s (1958) results with reinstate-ment of responding following extinction upon a singleresponse-independent presentation of the originally con-tingent event (food for rats and pigeons; token forstudents). A burst of responding (circled) followed thepresentation, suggesting that the contingent event func-tioned as a discriminative stimulus or an inducer.

Fig. 8. Cumulative record that Skinner (1948) pre-sented as evidence of ‘‘reconditioning’’ of a response ofhopping from side to side. Food was delivered once aminute independently of the pigeon’s behavior. Fooddeliveries (labeled ‘‘reinforcements’’) are indicated byarrows. No response immediately preceded the first andpossibly the second food deliveries (circled), indicatingthat the result is actually an example of reinstatement.Copyright E1948 by the American Psychological Associa-tion. Reprinted by permission.

108 WILLIAM M. BAUM

Page 9: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

that induce the various activities. That requiresus to consider the role of contingency, be-cause contingencies govern the occurrence ofPIEs. The Gardners’ account provided noexplanation of the flexibility of individuals’behavior, because they overlooked the impor-tance of contingency within behavioral evolu-tion (Baum, 1988).

Analogy to Natural Selection

As mentioned earlier, optimality by itselfis not an explanation, but rather requiresexplanation by mechanism. The evolution of apopulation by natural selection offers anexample. Darwinian evolution requires threeingredients: (1) variation; (2) recurrence; and(3) selection (e.g., Dawkins, 1989; Mayr, 1970).When these are all present, evolution occurs;they are necessary and sufficient. Phenotypeswithin a population vary in their reproductivesuccess—i.e., in the number of surviving off-spring they produce. Phenotypes tend to recurfrom generation to generation—i.e., offspringtend to resemble their parents—because par-ents transmit genetic material to their off-spring. Selection occurs because the carryingcapacity of the environment sets a limit to thesize of the population, with the result thatmany offspring fail to survive, and the pheno-type with more surviving offspring increases inthe population.

The evolution of behavior within a lifetime(i.e., shaping), seen as a Darwinian process,also requires variation, recurrence, and selec-tion (Staddon & Simmelhag, 1971). Variationoccurs within and between activities. Sinceevery activity is composed of parts that arethemselves activities, variation within an activ-ity is still variation across activities, but on asmaller time scale (Baum, 2002; 2004). PIEsand PIE-related stimuli induce various activi-ties within any situation. Thus, induction is themechanism of recurrence, causing activitiesto persist through time analogously to repro-duction causing phenotypes to persist in apopulation through time. Selection occursbecause of the finiteness of time, which is theanalog to the limit on population size set bythe carrying capacity of the environment.When PIEs induce various activities, thoseactivities compete, because any increase inone necessitates a decrease in others. As anexample, when a pigeon is trained to peck atkeys that occasionally produce food, a pigeon’s

key pecking is induced by food, but peckingcompetes for time with background activitiessuch as grooming and resting (Baum, 2002). Ifthe left key produces food five times moreoften than the right key, then the combinationleft-peck-plus-food induces more left-key peck-ing than right-key pecking—the basis of thematching relation (Herrnstein, 1961; Davison& McCarthy, 1988). Due to the upper limit onpecking set by its competition with back-ground activities, pecking at the left key com-petes for time with pecking at the right key(Baum, 2002). If we calculate the ratio of pecksleft to pecks right and compare it with theratio of food left to food right, the result maydeviate from strict matching between peckratio and food ratio across the two keys (Baum,1974; 1979), but the results generally approx-imate matching, and simple mathematicalproof shows matching to be optimal (Baum,1981b).

Thus, evolution in populations of organismsand behavioral evolution, though both may beseen as Darwinian optimizing processes, pro-ceed by definite mechanisms. Approximationto optimality is the result, but the mechanismsexplain that result.

To understand behavioral evolution fully,we need not only to understand the competingeffects of PIEs, but we need to account for theoccurrence of the PIEs—how they are pro-duced by behavior. We need to include theeffects of contingencies.

Contingency

The notion of reinforcement was alwaysknown to be incomplete. By itself, it includedno explanation of the provenance of the be-havior to be strengthened; behavior has tooccur before it can be reinforced. Segal (1972)thought that perhaps induction might explainthe provenance of operant behavior. Perhapscontingency selects behavior originally in-duced. Before we pursue this idea, we need tobe clear about the meaning of ‘‘contingency.’’

In the quotes we saw earlier, Skinner de-fined contingency as contiguity—a reinforcerneeded only to follow a response. Contrary tothis view based on ‘‘order and proximity,’’ acontingency is not a temporal relation. Re-scorla (1968; 1988) may have been the first topoint out that contiguity alone cannot sufficeto specify a contingency, because contingencyrequires a comparison between at least two

RETHINKING REINFORCEMENT 109

Page 10: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

different occasions. Figure 9 illustrates thepoint with a 2-by-2 table, in which the columnsshow the presence and absence of Event 1, andthe rows show the presence and absence ofEvent 2. Event 1 could be a tone or keypecking; Event 2 could be food or electricshock. The upper left cell represents theconjunction of the two events (i.e., contiguity);the lower right cell represents the conjunctionof the absence of the two events. These twoconjunctions (checkmarks) must both havehigh probability for a contingency to exist. If,for example, the probability of Event 2 werethe same, regardless of the presence orabsence of Event 1 (top left and right cells),then no contingency would exist. Thus, thepresence of a contingency requires a compar-ison across two temporally separated occasions(Event 1 present and Event 1 absent). Acorrelation between rate of an activity andfood rate would entail more than two suchcomparisons—for example, noting variousfood rates across various peck rates (the basisof a feedback function; Baum, 1973, 1989).Contrary to the idea that contingency requiresonly temporal conjunction, Figure 9 showsthat accidental contingencies should be rare,because an accidental contingency would re-quire at least two accidental conjunctions.

It is not that temporal relations are entirelyirrelevant, but just that they are relevant in adifferent way from what traditional reinforce-ment would require. Whereas Skinner’s for-mulation assigned to contiguity a direct role,the conception of contingency illustrated inFigure 9 suggests instead that the effect ofcontiguity is indirect. For example, insertingunsignalled delays into a contingency usuallyreduces the rate of the activity (e.g., keypecking) producing the PIE (e.g., food). Such

delays must affect the tightness of the corre-lation. A measure of correlation such as thecorrelation coefficient (r) would decrease asaverage delay increased. Delays affect theclarity of a contingency like that in Figure 9and affect the variance in a correlation (forfurther discussion, see Baum, 1973).

Avoidance

No phenomenon better illustrates the inad-equacy of the molecular view of behavior thanavoidance, because the whole point of avoid-ance is that when the response occurs, nothingfollows. For example, Sidman (1953) trainedrats in a procedure in which lever pressingpostponed electric shocks that, in the absenceof pressing, occurred at a regular rate. Thehigher the rate of pressing, the lower the rateof shock, and if pressing occurred at a highenough rate, the rate of shock was reducedclose to zero. To try to explain the leverpressing, molecular theories resort to unseen‘‘fear’’ elicited by unseen stimuli, ‘‘fear’’reduction resulting from each lever press—so-called ‘‘two-factor’’ theory (Anger, 1963; Dins-moor, 2001). In contrast, a molar theory ofavoidance makes no reference to hiddenvariables, but relies on the measurable reduc-tion in shock frequency resulting from the leverpressing (Herrnstein, 1969; Sidman, 1953;1966). Herrnstein and Hineline (1966) testedthe molar theory directly by training rats in aprocedure in which pressing reduced thefrequency of shock but could not reduce it tozero. A molecular explanation requires positingunseen fear reduction that depends only on thereduction in shock rate, implicitly concedingthe molar explanation (Herrnstein, 1969).

In our present terms, the negative contin-gency between shock rate and rate of pressingresults in pressing becoming a shock-relatedactivity. Shocks then induce lever pressing. Inthe Herrnstein–Hineline procedure, this in-duction is clear, because the shock neverdisappears altogether, but in Sidman’s proce-dure, we may guess that the experimentalcontext—chamber, lever, etc.—also induceslever pressing, as discriminative stimuli or con-ditional inducers. Much evidence supportsthese ideas. Extra response-independent shocksincrease pressing (Sidman, 1966; cf. reinstate-ment discussed earlier); extinction of Sidmanavoidance is prolonged when shock rate is re-duced to zero, but becomes faster on repetition

Fig. 9. Why a contingency or correlation is not simplya temporal relation. The 2-by-2 table shows the conjunc-tions possible of the presence and absence of two events,E1 and E2. A positive contingency holds between E1 andE2 only if two conjunctions occur with high probability atdifferent times: the presence of both and the absence ofboth (indicated by checks). The conjunction of the twoalone (contiguity) cannot suffice.

110 WILLIAM M. BAUM

Page 11: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

(Sidman, 1966; cf. the Bullock & Smith studydiscussed earlier); extinction of Herrnstein–Hineline avoidance in which pressing no longerreduces shock rate depends on the amount ofshock-rate reduction during training (Herrn-stein & Hineline, 1966); sessions of Sidmanavoidance typically begin with a ‘‘warm-up’’period, during which shocks are delivered untillever pressing starts—i.e., until the shock in-duces lever pressing (Hineline, 1977).

Figure 10 illustrates why shock becomes adiscriminative stimulus or a conditional induc-er in avoidance procedures. This two-by-twotable shows the effects of positive and negativecontingencies (columns) paired with fitness-enhancing and fitness-reducing PIEs (rows).Natural selection ensures that dangerous(fitness-reducing) PIEs (e.g., injury, illness, orpredators) induce defensive activities (e.g.,hiding, freezing, or fleeing) that remove ormitigate the danger. (Those individuals in thepopulation that failed to behave so reliablyproduced fewer surviving offspring.) In avoid-ance, since shock or injury is fitness-reducing,any activity that would avoid it will be inducedby it. After avoidance training, the operantactivity (e.g., lever pressing) becomes a(conditional) defensive, fitness-maintaining

activity, and is induced along with othershock-related activities; the shock itself andthe (dangerous) operant chamber do theinducing. This lower-right cell in the tablecorresponds to relations typically called ‘‘neg-ative reinforcement.’’

Natural selection ensures also that fitness-enhancing PIEs (e.g., prey, shelter, or a mate)induce fitness-enhancing activities (e.g., feed-ing, sheltering, or courtship). (Individuals inthe population that behaved so reliably leftmore surviving offspring.) In the upper leftcell in Figure 10, a fitness-enhancing PIE (e.g.,food) stands in a positive relation to a target(operant) activity, and training results in thetarget activity’s induction along with otherfitness-enhancing, PIE-related activities. Thiscell corresponds to relations typically called‘‘positive reinforcement.’’

When lever pressing is food-related or whenlever pressing is shock-related, the food orshock functions as a discriminative stimulus,inducing lever pressing. The food predictsmore food; the shock predicts more shock. Apositive correlation between food rate andpress rate creates the condition for pressing tobecome a food-related activity. A negativecorrelation between shock rate and press rate

Fig. 10. Different correlations or contingencies induce either the target activity or other-than-target activities,depending on whether the Phylogenetically Important Event (PIE) involved usually enhances or reduces fitness(reproductive success) by its presence.

RETHINKING REINFORCEMENT 111

Page 12: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

creates the condition for pressing to become ashock-related activity. More precisely, the cor-relations create the conditions for food andshock to induce allocations including substan-tial amounts of time spent pressing.

In the other two cells of Figure 10, the targetactivity would either produce a fitness-reducingPIE—the cell typically called ‘‘positive punish-ment’’—or prevent a fitness-enhancing PIE—the cell typically called ‘‘negative punishment.’’Either way, the target activity would reducefitness and would be blocked from joining theother PIE-related activities. Instead, otheractivities, incompatible with the target activity,that would maintain fitness, are induced.Experiments with positive punishment set upa conflict, because the target activity usuallyproduces both a fitness-enhancing PIE and afitness-reducing PIE. For example, when oper-ant activity (e.g., pigeons’ key pecking) produc-es both food and electric shock, the activityusually decreases below its level in the absenceof shock (Azrin & Holz, 1966; Rachlin &Herrnstein, 1969). The food induces pecking,but other activities (e.g., pecking off the key)are negatively correlated with shock rate andare induced by the shock. The result is acompromise allocation including less key peck-ing. We will come to a similar conclusion aboutnegative punishment, in which target activity(e.g., pecking) cancels a fitness-enhancing PIE(food delivery), when we discuss negativeautomaintence. The idea that punishmentinduces alternative activities to the punishedactivity is further supported by the observationthat if the situation includes another activitythat is positively correlated with food andproduces no shock, that activity dominates(Azrin & Holz, 1966).

Effects of Contingency

Contingency links an activity to an inducingevent and changes the time allocation amongactivities by increasing time spent in the linkedactivity. The increase in time spent in anactivity—say, lever pressing—when food ismade contingent on it was in the past attri-buted to reinforcement, but our examplessuggest instead that it is due to induction.The increase in pressing results from thecombination of contingency and induction,because the contingency turns the pressinginto a food-related (PIE-related) activity, asshown in the experiments by Bullock and Smith

(1953) (Figure 6) and by Reid (1958) (Figure 7).We may summarize these ideas as follows:

1. Phylogenetically Important Events (PIEs)are unconditional inducers.

2. A stimulus correlated with a PIE becomesa conditional inducer.

3. An activity positively correlated with afitness-enhancing PIE becomes a PIE-related conditional induced activity—usu-ally called ‘‘operant’’ or ‘‘instrumental’’activity.

4. An activity negatively correlated with afitness-reducing PIE becomes a PIE-relatedconditional induced activity—often called‘‘operant avoidance.’’

5. A PIE induces operant activity related to it.6. A conditional inducer induces operant

activity related to the PIE to which theconditional inducer is related.

The effects of contingency need include nonotion of strengthening or reinforcement.Consider the example of lever pressing main-tained by electric shock. Figure 11 shows asample cumulative record of a squirrel mon-key’s lever pressing from Malagodi, Gardner,Ward, and Magyar (1981). The scallops resem-ble those produced by a fixed-interval sched-ule of food, starting with a pause and thenaccelerating to a higher response rate up tothe end of the interval. Yet, in this record thelast press in the interval produces an electricshock. The result requires, however, that themonkey be previously trained to press thissame lever to avoid the same electric shock.Although in the avoidance training shockacted as an aversive stimulus or punisher, inFigure 11 it acts, paradoxically, to maintainthe lever pressing, as if it were a reinforcer.The paradox is resolved when we recognizethat the shock, as a PIE, induces the leverpressing because the prior avoidance trainingmade lever pressing a shock-related activity.No notion of reinforcement enters in—onlythe effect of the contingency. The shock in-duces the lever pressing, the pressing producesthe shock, the shock induces the pressing, andso on, in a loop.

The diagrams in Figure 12 show how thecontingency closes a loop. The diagram on theleft illustrates induction alone. The environ-ment E produces a stimulus S (shock), and theorganism O produces the induced activities B.

112 WILLIAM M. BAUM

Page 13: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

The diagram on the right illustrates the effectof contingency. Now the environment E linksthe behavior B to the stimulus S, resulting inan induction-contingency loop.

The reader may already have realized that thesame situation and the same diagram apply toexperiments with other PIEs, such as food. Forexample, Figure 1 illustrates that, among otheractivities, food induces pecking in pigeons.When a contingency arranges also that peckingproduces food, we have the same sort of loop asshown in Figure 12, except that the stimulus Sis food and the behavior B is pecking. Foodinduces the activity, and the (operant) activityproduces the food, which induces the activity,and so on, as shown by the results of Bullockand Smith (1953) (Figure 6) and Reid’s (1958)reinstatement effect (Figure 7).

Figure 13 shows results from an experimentthat illustrates the effects of positive and nega-tive contingencies (Baum, 1981a). Pigeons,trained to peck a response key that occasion-ally produced food, were exposed to dailysessions in which the payoff schedule beganwith a positive correlation between peckingand food rate. It was a sort of variable-intervalschedule, but required only that a peck occur

anywhere in a programmed interval for food tobe delivered at the end of that interval(technically, a conjunctive variable-time 10-sfixed-ratio 1 schedule). At an unpredictablepoint in the session, the correlation switchedto negative; now, a peck during a programmedinterval canceled the food delivery at the endof the interval. Finally, at an unpredictablepoint, the correlation reverted to positive.Figure 13 shows four representative cumula-tive records of 4 pigeons from this experiment.Initially, pecking occurred at a moderate rate,then decreased when the correlation switchedto negative (first vertical line), and thenincreased again when the correlation revertedto positive (second vertical line). The effec-tiveness of the negative contingency variedacross the pigeons; it suppressed key peckingcompletely in Pigeon 57, and relatively little inPigeon 61. Following our present line, theseresults suggest that when the correlation waspositive, the food induced key pecking at ahigher rate than when the correlation wasnegative. In accord with the results shown inFigure 1, however, we expect that the foodcontinues to induce pecking. As we expectfrom the upper-right cell of Figure 10, in theface of a negative contingency, pigeons typi-cally peck off the key, sometimes right next toit. The negative correlation between food andpecking on the key constitutes a positivecorrelation between food and pecking off thekey; the allocation between the two activitiesshifts in the three phases shown in Figure 13.

The results in the middle phase of therecords in Figure 13 resemble negative auto-maintenance (Williams & Williams, 1969). Inautoshaping, a pigeon is repeatedly presentedwith a brief light on a key, which is followed byfood. Sooner or later, the light comes toinduce key pecking, and pecking at the lit key

Fig. 11. Cumulative record of a squirrel monkey pressing a lever that produced electric shock at the end of the fixedinterval. Even though the monkey had previously been trained to avoid the very same shock, it now continues to presswhen pressing produces the shock. This seemingly paradoxical result is explained by the molar view of behavior as anexample of induction. Reprinted from Malagodi, Gardner, Ward, and Magyar (1981).

Fig. 12. How contingency completes a loop in whichan operant activity (B) produces an inducing event (S),which in turn induces more of the activity. O stands fororganism. E stands for environment. Left: induction alone.Right: the contingency closes the loop. Induction occursbecause the operant activity is or becomes related to theinducer (S; a PIE).

RETHINKING REINFORCEMENT 113

Page 14: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

becomes persistent; autoshaping becomesautomaintenance. In negative automainte-nance, pecks at the key cancel the delivery offood. This negative contingency causes areduction in key pecking, as in Figure 13,even if it doesn’t eliminate it altogether(Sanabria, Sitomer, & Killeen, 2006). Presum-ably, a compromise occurs between peckingon the key and pecking off the key—theallocation of time between pecking on thekey and pecking off the key shifts back andforth. Possibly, the key light paired with foodinduces pecking on the key, whereas the foodinduces pecking off the key.

A Test

To test the molar view of contingency orcorrelation, we may apply it to explaining apuzzle. W. K. Estes (1943; 1948) reported twoexperiments in which he pretrained rats in two

conditions: (a) a tone was paired with foodwith no lever present; and (b) lever pressingwas trained by making food contingent onpressing. The order of the two conditionsmade no difference to the results. The keypoint was that the tone and lever neveroccurred together. Estes tested the effects ofthe tone by presenting it while the leverpressing was undergoing extinction (no food).Figure 14 shows a typical result. Pressingdecreases across 5-min intervals, but each timethe tone occurred, pressing increased.

This finding presented a problem for Estes’smolecular view—that is, for a theory relying oncontiguity between discrete events. The diffi-culty was that, because the tone and the leverhad never occurred together, no associativemechanism based on contiguity could ex-plain the effect of the tone on the pressing.Moreover, the food pellet couldn’t mediate

Fig. 13. Results from an experiment showing discrimination of correlation. Four cumulative records of completesessions from 4 pigeons are shown. At the beginning of the session, the correlation between pecking and feeding ispositive; if a peck occurred anywhere in the scheduled time interval, food was delivered at the end of the interval.Following the first vertical line, the correlation switched to negative; a peck during the interval canceled food at the endof the interval. Peck rate fell. Following the second vertical line, the correlation reverted to positive, and peck rate roseagain. Adapted from Baum (1981a).

114 WILLIAM M. BAUM

Page 15: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

between the tone and the lever, because thefood had never preceded the pressing, onlyfollowed it. Estes ‘‘explanation’’ was to makeup what he called a ‘‘conditioned anticipatorystate’’ (CAS). He proposed that pairing thetone with food resulted in the tone’s elicitingthe CAS and that the CAS then affected leverpressing during the extinction test. This idea,however, begs the question, ‘‘Why would theCAS increase pressing?’’ It too would neverhave preceded pressing. The ‘‘explanation’’only makes matters worse, because, whereasthe puzzle started with accounting for theeffect of the tone, now it has shifted to anunobserved hypothetical entity, the CAS. Themuddle illustrates well how the molecular viewof behavior leads to hypothetical entities andmagical thinking. The molecular view can ex-plain Estes’s result, but only by positing un-seen mediators (Trapold & Overmier, 1972).(See Baum, 2002, for a fuller discussion.)

The molar paradigm (e.g., Baum, 2002;2004) offers a straightforward explanation ofEstes’s result. First, pairing the tone with thefood makes the tone an inducing stimulus(a conditional inducer). This means that thetone will induce any food-related activity.

Second, making the food contingent on leverpressing—that is, correlating pressing withfood—makes pressing a food-related activity.Thus, when the tone is played in the presenceof the lever, it induces lever pressing. When weescape from a focus on contiguity, the resultseems obvious.

Constraint and Connection

The effects of contingency go beyond just‘‘closing the loop.’’ A contingency has twoeffects: (a) it creates or causes a correlationbetween the operant activity and the contin-gent event—the equivalent of what an econo-mist would call a constraint; (b) as it occursrepeatedly, it soon causes the operant activityto become related to the contingent (induc-ing) event—it serves to connect the two.

Contingency ensures the increase in theoperant activity, but it also constrains theincrease by constraining the possible outcomes(allocations). Figure 15 illustrates these ef-fects. The vertical axis shows frequency of aninducing event represented as time spent in afitness-enhancing activity, such as eating ordrinking. The horizontal axis shows time spentin an induced or operant activity, such as leverpressing or wheel running. Figure 15 diagramssituations studied by Premack (1971) and byAllison, Miller, and Wozny (1979); it resemblesalso a diagram by Staddon (1983; Figure 7.2).In a baseline condition with no contingency, alot of time is spent in the inducing activity andrelatively little time is spent in the to-be-operant activity. The point labeled ‘‘baseline’’indicates this allocation. When a contingencyis introduced, represented by the solid diago-nal line, which indicates a contingency suchas in a ratio schedule, a given duration ofthe operant activity allows a given duration ofthe inducing activity (or a certain amountof the PIE). The contingency constrains thepossible allocations between time in the con-tingent activity (eating or drinking) and timein the operant activity (pressing or running).Whatever allocation occurs must lie on theline; allocations off the line are no longerpossible. The point on the line illustrates theusual result: an allocation that includes moreoperant activity than in baseline and less of thePIE than in baseline. The increase in operantactivity would have been called a reinforce-ment effect in the past, but here it appears asan outcome of the combination of induction

Fig. 14. Results from an experiment in which leverpressing undergoing extinction was enhanced by presen-tation of a tone that had previously been paired with thefood that had been used in training the lever pressing.Because the tone and lever had never occurred togetherbefore, the molecular view had difficulty explaining theeffect of the tone on the pressing, but the molar viewexplains it as induction of pressing as a food-relatedactivity. The data were published by Estes (1948).

RETHINKING REINFORCEMENT 115

Page 16: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

with the constraint of the contingency line(also known as the feedback function; Baum,1973; 1989; 1992). The decrease in the PIEmight have been called punishment of eatingor drinking, but here it too is an outcome ofthe constraint imposed by the contingency(Premack, 1965).

Another way to express the effects inFigure 15 might be to say that, by constrainingthe allocations possible, the contingency‘‘adds value’’ to the induced operant activity.Once the operant activity becomes PIE-related,the operant activity and the PIE becomeparts of a ‘‘package.’’ For example, peckingand eating become parts of an allocation orpackage that might be called ‘‘feeding,’’ andthe food or eating may be said to lend value tothe package of feeding. If lever pressing isrequired for a rat to run in a wheel, pressingbecomes part of running (Belke, 1997; Belke &Belliveau, 2001). If lever pressing is requiredfor a mouse to obtain nest material, pressingbecomes part of nest building (Roper, 1973).Figure 15 indicates that this doesn’t occur

without some cost, too, because the operantactivity doesn’t increase enough to bring thelevel of eating to its level in baseline; someeating is sacrificed in favor of a lower level ofthe operant activity. In an everyday example,suppose Tom and his friends watch footballevery Sunday and usually eat pizza whilewatching. Referring to Figure 9, we may saythat watching football (E1) is correlated witheating pizza (E2), provided that the highestfrequencies are the conjunctions in thechecked boxes, and the conjunctions in theempty boxes are relatively rare. Rather thanappeal to reinforcement or strengthening, wemay say that watching football induces eatingpizza or that the two activities form a packagewith higher value than either by itself. Wedon’t need to bring in the concept of ‘‘value’’at all, because all it means is that the PIEinduces PIE-related activities, but this inter-pretation of ‘‘adding value’’ serves to connectthe present discussion with behavioral eco-nomics. Another possible connection wouldbe to look at demand as induction of activity(e.g., work or shopping) by the good demand-ed and elasticity as change in induction withchange in the payoff schedule or feedbackfunction. However, a full quantitative accountof the allocations that actually occur is beyondthe scope of this article (see Allison, 1983;Baum, 1981b; Rachlin, Battalio, Kagel, andGreen, 1981; and Rachlin, Green, Kagel, andBattalio, 1976 for examples.) Part of a theorymay be had by quantifying contingency withthe concept of a feedback function, becausefeedback functions allow more precise specifi-cation of the varieties of contingency.

Correlations and Feedback Functions

Every contingency or correlation specifies afeedback function (Baum, 1973; 1989; 1992).A feedback function describes the dependenceof outcome rate (say, rate of food delivery) andoperant activity (say, rate of pecking). Theconstraint line in Figure 15 specifies a feed-back function for a ratio schedule, in whichthe rate of the contingent inducer is directlyproportional to the rate of the inducedactivity. The feedback function for a variable-interval schedule is more complicated, be-cause it must approach the programmedoutcome rate as an asymptote. Figure 16shows an example. The upper curve has threeproperties: (a) it passes through the origin,

Fig. 15. Effects of contingency. In a baseline conditionwith no contingency, little of the to-be-operant activity(e.g., lever pressing or wheel running; the to-be-inducedactivity) occurs, while a lot of the to-be-contingent PIE-related activity (e.g., eating or drinking; the inducingactivity) occurs. This is the baseline allocation (upper leftpoint and broken lines). After the PIE (food or water) ismade contingent on the operant activity, whateverallocation occurs must lie on the solid line. The arrowpoints to the new allocation. Typically, the new allocationincludes a large increase in the operant activity overbaseline. Thus, contingency has two effects: a) constrain-ing possible allocations; and b) making the operantbehavior PIE-related so it is induced in a large quantity.

116 WILLIAM M. BAUM

Page 17: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

because if no operant activity occurs, no foodcan be delivered; (b) it approaches an asy-mptote; and (c) as operant activity approacheszero, the slope of the curve approaches thereciprocal of the parameter a in the equationshown (1/a; Baum, 1992). It constrains possi-ble performance, because whatever the rate ofoperant activity, the outcome (inducer) ratemust lie on the curve. The lower curve suggestsa frequency distribution of operant activity onvarious occasions. Although average operantrate and average inducer rate would corre-spond to a point on the feedback curve, thedistribution indicates that operant activityshould vary around a mode and would notnecessarily approach the maximum activityrate possible.

Figure 17 shows some unpublished data,gathered by John Van Syckel (1983), a studentat the University of New Hampshire. Severalsessions of variable-interval responding by a ratwere analyzed by examining successive 2-mintime samples and counting the number oflever presses and food deliveries in each one.All time samples were combined according thenumber of presses, and the press rate andfood rate calculated for each number ofpresses. These are represented by the plottedpoints. The same equation as in Figure 15 isfitted to the points; the parameter a was closeto 1.0. The frequency distribution of pressrates is shown by the curve without symbols(right-hand vertical axis). Some 2-min samples

contained no presses, a few contained onlyone (0.5 press/min), and a mode appears atabout 5 press/min. Press rates above about 15press/min (30 presses in 2 min) were rare.Thus, the combination of induction of leverpressing by occasional food together with thefeedback function results in the stable perfor-mance shown.

Although the parameter a was close to 1.0 inFigure 17, it usually exceeds 1.0, ranging ashigh as 10 or more when other such data setsare fitted to the same equation (Baum, 1992).This presents a puzzle, because one mightsuppose that as response rate dropped toextremely low levels, a food delivery wouldhave set up prior to each response, and eachresponse would deliver food, with the resultthat the feedback function would approximatethat for a fixed-ratio 1 schedule near theorigin. A possible explanation as to why theslope (1/a) continues to fall short of 1.0 nearthe origin might be that when food deliveriesbecome rare, a sort of reinstatement effectoccurs, and each delivery induces a burst ofpressing. Since no delivery has had a chance toset up, the burst usually fails to produce anyfood and thus insures several presses for eachfood delivery.

Fig. 16. Example of a feedback function for a variable-interval schedule. The upper curve, the feedback function,passes through the origin and approaches an asymptote(60 PIEs per h; a VI 60s). Its equation appears at the lowerright. The average interval t equals 1.0. The parameter aequals 6.0. Response rate is expected to vary from time totime, as shown by the distribution below the feedbackfunction (frequency is represented on the right-handvertical axis).

Fig. 17. Example of an empirical feedback functionfor a variable-interval schedule. Successive 120-s timewindows were evaluated for number of lever presses andnumber of food deliveries. The food rate and press ratewere calculated for each number of presses per 120 s. Theunfilled diamonds show the food rates. The feedbackequation from Figure 15 was fitted to these points (t 50.52; a 5 0.945). The frequency distribution (right-handvertical axis) below shows the percent frequencies of thevarious response rates. The filled square shows the averageresponse rate.

RETHINKING REINFORCEMENT 117

Page 18: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

The relevance of feedback functions hasbeen challenged by two sorts of experiment.In one type, a short-term contingency is pittedagainst a long-term contingency, and the short-term contingency is shown to govern perfor-mance to a greater extent than the long-termcontingency (e.g., Thomas, 1981; Vaughan &Miller, 1984). In these experiments, the keypecking or lever pressing produces food in theshort-term contingency, but cancels food in thelong-term contingency. Responding is subopti-mal, because the food rate would be higherif no responding occurred. That respondingoccurs in these experiments supports thepresent view. The positively correlated foodinduces the food-related activity (pressing orpecking); the other, negatively correlated food,when it occurs, would simply contribute toinducing the operant activity.

These experiments may be compared toobservations such as negative automainte-nance (Williams & Williams, 1969) or whatHerrnstein and Loveland (1972) called ‘‘foodavoidance.’’ Even though key pecking cancelsthe food delivery, still the pigeon pecks, be-cause the food, when it occurs, inducespecking (unconditionally; Figure 1). Sanabria,Sitomer, and Killeen (2006) showed, however,that under some conditions the negativecontingency is highly effective, reducing therate of pecking to low levels despite thecontinued occurrence of food. As discussedearlier in connection with Figure 13, thenegative contingency between key peckingand food implies also a positive contingencybetween other activities and food, particularlypecking off the key. Similar considerationsexplain contrafreeloading—the observationthat a pigeon or rat will peck at a key or pressa lever even if a supply of the same foodproduced by the key or lever is freely available(Neuringer, 1969; 1970). The pecking orpressing apparently is induced by the foodproduced, even though other food is available.Even with no pecking key available, Palya andZacny (1980) found that untrained pigeonsfed at a certain time of day would peck justabout anywhere (any ‘‘spot’’) around thattime of day.

The other challenge to the relevance offeedback functions is presented by experi-ments in which the feedback function ischanged with no concomitant change inbehavior. For example, Ettinger, Reid, and

Staddon (1987) studied rats’ lever pressingthat produced food on interlocking sched-ules—schedules in which both time andpressing interchangeably advance the sched-ule toward the availability of food. At lowresponse rates, the schedule resembles aninterval schedule, because the schedule ad-vances mainly with the passage of time,whereas at high response rates, the scheduleresembles a ratio schedule, because the sched-ule advances mainly due to responding.Ettinger et al. varied the interlocking scheduleand found that average response rate de-creased in a linear fashion with increasingfood rate, but that variation in the schedulehad no effect on this relation. They concludedthat the feedback function was irrelevant. Theschedules they chose, however, were all func-tionally equivalent to fixed-ratio schedules,and the rats responded exactly as they wouldon fixed-ratio schedules. In fixed-ratio perfor-mance, induced behavior other than lever-pressing tends to be confined to a periodimmediately following food, as in Figure 1;once pressing begins, it tends to proceeduninterrupted until food again occurs. Thepostfood period increases as the ratio increases,thus tending to conserve the relative propor-tions of pressing and other induced activities.The slope of the linear feedback functiondetermines the postfood period, but responserate on ratio schedules is otherwise insensitiveto it (Baum, 1993; Felton & Lyon, 1966). Thedecrease in average response rate observed byEttinger et al. occurred for the same reason itoccurs with ratio schedules: because the post-food period, even though smaller for smallerratios, was an increasing proportion of theinterfood interval as the effective ratio de-creased. Thus, the conclusion that feedbackfunctions are irrelevant was unwarranted.

When we try to understand basic phenom-ena, such as the difference between ratio andinterval schedules, feedback functions proveindispensible. The two different feedbackfunctions—linear for ratio schedules (Fig-ure 15) and curvilinear for interval schedules(Figures 16 and 17)—explain the difference inresponse rate on the two types of schedule(Baum, 1981b). On ratio schedules, thecontingency loop shown in Figure 12 includesonly positive feedback and drives responserate toward the maximum possible underthe constraints of the situation. This maximal

118 WILLIAM M. BAUM

Page 19: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

response rate is necessarily insensitive to vari-ation in ratio or food rate. On intervalschedules, where food rate levels off, foodinduces a more moderate response rate(Figure 17). The difference in feedback func-tion is important for understanding both theselaboratory situations and also everyday situa-tions in which people deal with contingencieslike ratio schedules, in which their ownbehavior alone matters to production, versuscontingencies, like interval schedules, in whichother factors, out of their control, partiallydetermine production.

Explanatory Power

The conceptual power of this framework—allocation, induction and contingency—farexceeds that of the contiguity-based conceptof reinforcement. Whether we redefine theterm and call it ‘‘reinforcement,’’ or whetherwe call the process something else (‘‘induce-ment’’?), it explains a large range of phenom-ena. We have seen that it explains standardresults such as operant and respondent condi-tioning, operant–respondent interactions, andadjunctive behavior. We saw that it explainsavoidance and shock-maintained responding.It can explain the effects of noncontingentevents (e.g., ‘‘noncontingent reinforcement,’’an oxymoron). Let us conclude with a fewexamples from the study of choice, the studyof stimulus control, and observations ofeveryday life.

Dynamics of choice. In a typical choice ex-periment, a pigeon pecks at two response keys,each of which occasionally and unpredictablyoperates a food dispenser when pecked, basedon the passage of time (variable-intervalschedules). Choice or preference is measuredas the logarithm of the ratio of pecks at onekey to pecks at the other. For looking atputative reinforcement, we examine log ratioof pecks at the just-productive key (left orright) to pecks at the other key (right or left).With the right procedures, analyzing prefer-ence following food allows us to separate anystrengthening effect from the inducing effectof food (e.g., Cowie, Davison, & Elliffe, 2011).For example, a striking result shows that foodand stimuli predicting food induce activityother than the just-productive activity (Bou-tros, Davison, & Elliffe, 2011; Davison & Baum,2006; 2010; Krageloh, Davison, and Elliffe,2005).

A full understanding of choice requiresanalysis at various time scales, from extremelysmall—focusing on local relations—to ex-tremely large—focusing on extended relations(Baum, 2010). Much research on choiceconcentrated on extended relations by aggre-gating data across many sessions (Baum, 1979;Herrnstein, 1961). Some research has exam-ined choice in a more local time frame byaggregating data from one food delivery toanother (Aparicio & Baum, 2006; 2009; Baum& Davison, 2004; Davison & Baum, 2000;Rodewald, Hughes, & Pitts, 2010). A still morelocal analysis examines changes in choicewithin interfood intervals, from immediatelyfollowing food delivery until the next fooddelivery. A common result is a pulse of pre-ference immediately following food in favor ofthe just-productive alternative (e.g., Aparicio &Baum, 2009; Davison & Baum, 2006; 2010).After a high initial preference, choice typicallyfalls toward indifference, although it maynever actually reach indifference. These pref-erence pulses seem to reflect local inductionof responding at the just-productive alterna-tive. A still more local analysis looks at theswitches between alternatives or the alternat-ing visits at the two alternatives (Aparicio &Baum, 2006, 2009; Baum & Davison, 2004). Along visit to the just-productive alternativefollows food produced by responding at thatalternative, and this too appears to result fromlocal induction, assuming, for example, thatpecking left plus food induces pecking at theleft key whereas pecking right plus foodinduces pecking at the right key. Since aprevious paper showed that the preferencepulses are derivable from the visits, the longvisit may reflect induction more directly(Baum, 2010).

Differential-outcomes effect. In the differential-outcomes effect, a discrimination is enhancedby arranging that the response alternativesproduce different outcomes (Urcuioli, 2005).In a discrete-trials procedure, for example, ratsare presented on some trials with one auditorystimulus (S1) and on other trials with anotherauditory stimulus (S2). In the presence of S1, apress on the left lever (A1) produces food(X1), whereas in the presence of S2, a press onthe right lever (A2) produces liquid sucrose(X2). The discrimination that forms—pressingthe left lever in the presence of S1 andpressing the right lever in the presence of

RETHINKING REINFORCEMENT 119

Page 20: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

S2—is greater than when both actions A1 andA2 produce the same outcome. The result is achallenge for the molecular paradigm be-cause, focusing on contiguity, it sees the effectof the different outcomes as an effect of afuture event on present behavior. The ‘‘solu-tion’’ to this problem has been to bring thefuture into the present by positing unseen‘‘representations’’ or ‘‘expectancies’’ generat-ed by S1 and S2 and supposing that thoseinvisible events determine A1 and A2. In thepresent molar view, we may see the result as aneffect of differential induction by the differentoutcomes (Ostlund & Balleine, 2007). Thecombination of S1 and X1 makes S1 aconditional inducer of X1-related activities.The contingency between A1 and X1 makes A1an X1-related activity. As a result, S1 inducesA1. Similarly, as a result of the contingencybetween S2 and X2 and the contingencybetween A2 and X2, S2 induces A2, as an X2-related activity. In contrast, when both A1 andA2 produce the same outcome X, the onlybasis for discrimination is the relation of S1,A1, and X in contrast with the relation of S2,A2, and X. A1 is induced by the combinationS1-plus-X, and A2 is induced by the combina-tion S2-plus-X. Since S1 and S2 are bothconditional inducers related to X, S1 wouldtend to induce A2 to some extent, and S2would tend to induce A1 to some extent,thereby reducing discrimination. These rela-tions occur, not just on any one trial, but as apattern across many trials. Since no need arisesto posit any invisible events, the explanationexceeds the molecular explanation on thegrounds of plausibility and elegance.

Cultural transmission. The present view offersa good account of cultural transmission.Looking on cultural evolution as a Darwinianprocess, we see that the recurrence of apractice from generation to generation occurspartly by imitation and partly by rules (Baum,1995b; 2000; 2005; Boyd & Richerson, 1985;Richerson & Boyd, 2005). A pigeon in achamber where food occasionally occurs seesanother pigeon pecking at a ping-pong balland subsequently pecks at a similar ball(Epstein, 1984). The model pigeon’s peckinginduces similar behavior in the observerpigeon. When a child imitates a parent, theparent’s behavior induces behavior in thechild, and if the child’s behavior resemblesthe parent’s, we say the child imitated the

parent. As we grow up, other adults serve asmodels, particularly those with the trappingsof success—teachers, coaches, celebrities, andso forth—and their behavior induces similarbehavior in their imitators (Boyd & Richerson,1985; Richerson & Boyd, 2005). Crucial totransmission, however, are the contingenciesinto which the induced behavior enters fol-lowing imitation. When it occurs, is it corre-lated with resources, approval, or opportuni-ties for mating? If so, it will now be induced bythose PIEs, no longer requiring a model.Rules, which are discriminative stimuli gener-ated by a speaker, induce behavior in alistener, and the induced behavior, if itpersists, enters into correlations with bothshort-term PIEs offered by the speaker and,sometimes, long-term PIEs in long-term rela-tions with the (now no longer) rule-governedbehavior (Baum, 1995b; 2000; 2005).

Working for wages. Finally, let us consider theeveryday example of working for wages. Zackhas a job as a dishwasher in a restaurant andworks every week from Tuesday to Saturdayfor 8 hr a day. At the end of his shift onSaturday, he receives an envelope full ofmoney. Explaining his persistence in thisactivity presents a problem for the molecularparadigm, because the money on Saturdayoccurs at such a long delay after the work onTuesday or even the work on Saturday. Noone gives Zack money on any other occa-sion—say, after each dish washed. The molec-ular paradigm’s ‘‘solution’’ to this has been toposit hidden reinforcers that strengthen thebehavior frequently during his work. Ratherthan having to resort to such invisible causes,we may turn to the molar view and assert, withcommon sense, that the money itself main-tains the dishwashing. We may look upon thesituation as similar to a rat pressing a lever,but scaled up, so to speak. The money inducesdishwashing, not just in one week, but asa pattern of work across many weeks. Weexplain Zack’s first week on the job a bitdifferently, because he brings to the situationa pattern of showing up daily from havingattended school, which is induced by thepromises of his boss. The pattern of workmust also be seen in the larger context ofZack’s life activities, because the money is ameans to ends, not an end in itself; it permitsother activities, such as eating and socializing(Baum, 1995a; 2002).

120 WILLIAM M. BAUM

Page 21: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

Omissions and Inadequacies

As I said at the outset, to cover all thephenomena known to behavior analysts in thisone article would be impossible. The articleaims to set out a conceptual framework thatdraws those phenomena together in a way notachieved by the traditional framework.

Perhaps the most significant omission hereis a full treatment of the effects of context orwhat is traditionally called stimulus control. Inthe molar paradigm, context needs to betreated similarly to behavior—as extended intime and occurring on different time scales. Ina simple discrimination, the inducing stimu-li—say, a red light and a green light—may beseen in a relatively short time span. Inmatching to sample, however, where a sampleprecedes the exposure of the choice alterna-tives, the time frame is larger, because it has toinclude the sample plus the correct alternative(i.e., parts). More complex arrangementsrequire still longer time spans. For example,experiments with relational contexts inspiredby Relational Frame Theory require long timespans to specify the complex context for asimple activity like a child’s pointing at a card(e.g., Berens & Hayes, 2007). In everyday life,the contexts for behavior may extend overweeks, months, or years. A farmer might findover a period of several years that plantingcorn is more profitable than planting alfalfa,even though planting alfalfa is more profitablein some years. I might require many interac-tions with another person over a long timebefore I conclude he or she is trustworthy.

I have said less than I would have likedabout the traditional metaphor of strength as itis used in talking about response strength,bond strength, or associative strength. Onevirtue of the molar paradigm is that it permitsaccounts of behavior that omit the strengthmetaphor, substituting allocation and guid-ance instead (Davison & Baum, 2006; 2010).The concept of extinction of behavior hasbeen closely tied to the notion of responsestrength. A future paper will address the topicof extinction within the molar paradigm.

I left out verbal behavior. One may findsome treatment of it in my book, UnderstandingBehaviorism (Baum, 2005). It is a big topic thatwill need to be addressed in the future, but themolar paradigm lends itself to looking atverbal behavior in a variety of timeframes.The utterance stands as a useful concept, but

variable parts occur within similar utterancesthat have different effects on listeners (e.g.,‘‘Let me go’’ vs. ‘‘Never let me go.’’) and areinduced by different contexts. Relatively localparts like a plural ending or a possessive suffixalso are induced by different contexts. Utter-ances, however, are parts of more extendedactivities, such as conversations. Usually con-versations are parts of still more extendedactivities, like building a relationship or per-suading someone to part with money.

I have said little in this paper about rule-governed behavior, which is particularly impor-tant to understanding culture. I discussed it inearlier papers (Baum, 1995b, 2000). The molarparadigm is fruitful for discussing rules inrelation to short-term and long-term relationsfor groups and for individuals within groups.

The list of omissions is long, no doubt, butthe reader who is interested in pursuing thisconceptual framework will find many ways toapply it.

Conclusions

Taken together, the concepts of allocation,induction, and correlation (or contingency)account for the broad range of phenomenathat have been recognized by behavior analysissince the 1950s. They provide a more elegantand plausible account than the older notionsof reinforcement, strengthening, and elicita-tion. They render those concepts superfluous.

Induction not only solves the problem ofprovenance; induction accounts for the stim-ulus effects of PIEs and PIE-related stimuli.Induction also explains the effects of contin-gencies, because a contingency establishes acorrelation that makes the operant activity intoa PIE-related activity. Taken together, contin-gency and induction explain allocation orchoice. They also explain change in alloca-tion—that is, changes in behavior. None ofthis requires the concept of reinforcement asstrengthening.

The popular formulation of operant behavioras ‘‘Antecedent-Behavior-Consequence’’ (A-B-C) may be reconciled with this conclusion.When ‘‘consequences’’ increase an activity, theydon’t strengthen it, but they induce more of thesame activity, just as ‘‘antecedents’’ do. Indeed,the ‘‘consequence’’ is the antecedent of thebehavior that follows, as shown in Figures 6 and7. When ‘‘consequences’’ decrease an activity,

RETHINKING REINFORCEMENT 121

Page 22: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

they don’t weaken it, but induce other behaviorincompatible with it (Figure 10).

The formulation sketched here stems from amolar view of behavior that focuses on timespent in activities instead of discrete responses(e.g., Baum, 2002; 2004; Baum & Rachlin,1969). Criticism of the molecular view, whichis based on discrete responses and temporalcontiguity as central concepts, may be found inearlier papers (Baum, 1973; 2001; 2002; 2004).The present discussion shows the superiorityof the molar view in helping behavior analyststo rethink their central concepts and to betterexplain behavioral phenomena both in thelaboratory and in everyday life.

REFERENCES

Allison, J. (1983). Behavioral economics. New York: Praeger.Allison, J., Miller, M., & Wozny, M. (1979). Conservation in

behavior. Journal of Experimental Psychology: General,108, 4–34.

Anger, D. (1963). The role of temporal discriminations inthe reinforcement of Sidman avoidance behavior.Journal of the Experimental Analysis of Behavior, 6,477–506.

Aparicio, C. F., & Baum, W. M. (2006). Fix and sample withrats in the dynamics of choice. Journal of theExperimental Analysis of Behavior, 86, 43–63.

Aparicio, C. F., & Baum, W. M. (2009). Dynamics ofchoice: Relative rate and amount affect local prefer-ence at three different time scales. Journal of theExperimental Analysis of Behavior, 91, 293–317.

Azrin, N. H., & Holz, W. C. (1966). Punishment. In W. K.Honig (Ed.), Operant behavior: Areas of research andapplication (pp. 380–447). New York: Appleton-Century-Crofts.

Baum, W. M. (1973). The correlation-based law of effect.Journal of the Experimental Analysis of Behavior, 20,137–153.

Baum, W. M. (1974). On two types of deviation from thematching law: bias and undermatching. Journal of theExperimental Analysis of Behavior, 22, 231–242.

Baum, W. M. (1976). Time-based and count-basedmeasurement of preference. Journal of the ExperimentalAnalysis of Behavior, 26, 27–35.

Baum, W. M. (1979). Matching, undermatching, andovermatching in studies of choice. Journal of theExperimental Analysis of Behavior, 32, 269–281.

Baum, W. M. (1981a). Discrimination of correlation. In M.L. Commons, & J. A. Nevin (Eds.), Quantitativeanalyses of behavior: Discriminative properties of reinforce-ment schedules (Vol. I, pp. 247–256). Cambridge, MA:Ballinger.

Baum, W. M. (1981b). Optimization and the matching lawas accounts of instrumental behavior. Journal of theExperimental Analysis of Behavior, 36, 387–403.

Baum, W. M. (1988). Selection by consequences is a goodidea. Behavioral and Brain Sciences, 11, 447–448.

Baum, W. M. (1989). Quantitative prediction and molardescription of the environment. The Behavior Analyst,12, 167–176.

Baum, W. M. (1992). In search of the feedback functionfor variable-interval schedules. Journal of the Experimen-tal Analysis of Behavior, 57, 365–375.

Baum, W. M. (1993). Performances on ratio and intervalschedules of reinforcement: Data and theory. Journalof the Experimental Analysis of Behavior, 59, 245–264.

Baum, W. M. (1995a). Introduction to molar behavioranalysis. Mexican Journal of Behavior Analysis, 21, 7–25.

Baum, W. M. (1995b). Rules, culture, and fitness. TheBehavior Analyst, 18, 1–21.

Baum, W. M. (2000). Being concrete about culture andcultural evolution. In N. Thompson, & F. Tonneau(Eds.), Perspectives in Ethology (Vol. 13, pp. 181–212).New York: Kluwer Academic/Plenum.

Baum, W. M. (2001). Molar versus molecular as aparadigm clash. Journal of the Experimental Analysis ofBehavior, 75, 338–341.

Baum, W. M. (2002). From molecular to molar: Aparadigm shift in behavior analysis. Journal of theExperimental Analysis of Behavior, 78, 95–116.

Baum, W. M. (2004). Molar and molecular views of choice.Behavioural Processes, 66, 349–359.

Baum, W. M. (2005). Understanding behaviorism: Behavior,culture, and evolution (2nd edition). Malden, MA:Blackwell Publishing.

Baum, W. M. (2010). Dynamics of choice: A tutorial.Journal of the Experimental Analysis of Behavior, 94,161–174.

Baum, W. M., & Davison, M. (2004). Choice in a variableenvironment: Visit patterns in the dynamics of choice.Journal of the Experimental Analysis of Behavior, 81,85–127.

Baum, W. M., & Rachlin, H. C. (1969). Choice as timeallocation. Journal of the Experimental Analysis ofBehavior, 12, 861–874.

Belke, T. W. (1997). Running and responding reinforcedby the opportunity to run: Effect of reinforcerduration. Journal of the Experimental Analysis of Behavior,67, 337–351.

Belke, T. W., & Belliveau, J. (2001). The generalizedmatching law describes choice on concurrent variable-interval schedules of wheel-running reinforcement.Journal of the Experimental Analysis of Behavior, 75,299–310.

Berens, N. M., & Hayes, S. C. (2007). Arbitrarily applicablecomparative relations: Experimental evidence for arelational operant. Journal of Applied Behavior Analysis,40, 45–71.

Boutros, N., Davison, M., & Elliffe, D. (2011). Contingentstimuli signal subsequent reinforcer ratios. Journal ofthe Experimental Analysis of Behavior, 96, 39–61.

Boyd, R., & Richerson, P. J. (1985). Culture and theevolutionary process. Chicago: University of ChicagoPress.

Breland, K., & Breland, M. (1961). The misbehavior oforganisms. American Psychologist, 16, 681–684.

Bullock, D. H., & Smith, W. C. (1953). An effect of repeatedconditioning-extinction upon operant strength. Journalof Experimental Psychology, 46, 349–352.

Cowie, S., Davison, M., & Elliffe, D. (2011). Reinforce-ment: Food signals the time and location of futurefood. Journal of the Experimental Analysis of Behavior, 96,63–86.

Davison, M., & Baum, W. M. (2000). Choice in a variableenvironment: Every reinforcer counts. Journal of theExperimental Analysis of Behavior, 74, 1–24.

122 WILLIAM M. BAUM

Page 23: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

Davison, M., & Baum, W. M. (2006). Do conditionalreinforcers count? Journal of the Experimental Analysis ofBehavior, 86, 269–283.

Davison, M., & Baum, W. M. (2010). Stimulus effects onlocal preference: Stimulus–response contingencies,stimulus–food pairing, and stimulus–food correlation.Journal of the Experimental Analysis of Behavior, 93, 45–59.

Davison, M., & McCarthy, D. (1988). The matchinglaw: A research review. Hillsdale, NJ: Erlbaum Associ-ates.

Dawkins, R. (1989). The Selfish Gene (New edition). Oxford:Oxford University Press.

Dinsmoor, J. A. (2001). Stimuli inevitably generated bybehavior that avoids electric shock are inherentlyreinforcing. Journal of the Experimental Analysis ofBehavior, 75, 311–333.

Eibl-Eibesfeldt, I. (1975). Ethology: The biology of behavior(2nd edition). New York: Holt, Rinehart, and Win-ston.

Epstein, R. (1984). Spontaneous and deferred imitation inthe pigeon. Behavioural Processes, 9, 347–354.

Estes, W. K. (1943). Discriminative conditioning. I. Adiscriminative property of conditioned anticipation.Journal of Experimental Psychology, 32, 150–155.

Estes, W. K. (1948). Discriminative conditioning. II. Effectsof a Pavlovian conditioned stimulus upon a subse-quently established operant response. Journal ofExperimental Psychology, 38, 173–177.

Ettinger, R. H., Reid, A. K., & Staddon, J. E. R. (1987).Sensitivity to molar feedback functions: A test ofmolar optimality theory. Journal of Experimental Psychol-ogy: Animal Behavior Processes, 13, 366–375.

Felton, M., & Lyon, D. O. (1966). The post-reinforcementpause. Journal of the Experimental Analysis of Behavior, 9,131–134.

Ferster, C. B., & Skinner, B. F. (1957). Schedules ofreinforcement. New York: Appleton-Century-Crofts.

Gardner, R. A., & Gardner, B. T. (1988). Feedforwardversus feedbackward: An ethological alternative to thelaw of effect. Behavioral and Brain Sciences, 11, 429–493.

Herrnstein, R. J. (1961). Relative and absolute strength ofresponse as a function of frequency of reinforcement.Journal of the Experimental Analysis of Behavior, 4, 267–272.

Herrnstein, R. J. (1969). Method and theory in the study ofavoidance. Psychological Review, 76, 49–69.

Herrnstein, R. J. (1970). On the law of effect. Journal of theExperimental Analysis of Behavior, 13, 243–266.

Herrnstein, R. J., & Hineline, P. N. (1966). Negativereinforcement as shock-frequency reduction. Journalof the Experimental Analysis of Behavior, 9, 421–430.

Herrnstein, R. J., & Loveland, D. H. (1972). Food-avoidance in hungry pigeons, and other perplexities.Journal of the Experimental Analysis of Behavior, 18,369–383.

Hinde, R. A., & Stevenson-Hinde, J. (Eds.). (1973). Cons-traints on learning. New York: Academic Press.

Hineline, P. N. (1977). Negative reinforcement andavoidance. In W. K. Honig, & J. E. R. Staddon(Eds.), Handbook of operant behavior (pp. 364–414).Englewood Cliffs, NJ: Prentice-Hall.

Hineline, P. N. (1984). Aversive control: A separatedomain? Journal of the Experimental Analysis of Behavior,42, 495–509.

Killeen, P. R. (1994). Mathematical principles of rein-forcement. Behavioral and Brain Sciences, 17, 105–171.

Krageloh, C. U., Davison, M., & Elliffe, D. M. (2005). Localpreference in concurrent schedules: The effects ofreinforcer sequences. Journal of the Experimental Anal-ysis of Behavior, 84, 37–64.

Kuhn, T. S. (1970). The structure of scientific revolutions (2ndedition). Chicago: University of Chicago Press.

Malagodi, E. F., Gardner, M. L., Ward, S. E., & Magyar, R.L. (1981). Responding maintained under intermittentschedules of electric-shock presentation: ‘‘Safety’’ orschedule effects? Journal of the Experimental Analysis ofBehavior, 36, 171–190.

Mayr, E. (1970). Populations, species, and evolution. Cam-bridge, MA: Harvard University Press.

Neuringer, A. J. (1969). Animals respond for food in thepresence of free food. Science, 166, 399–401.

Neuringer, A. J. (1970). Many responses per food rewardwith free food present. Science, 169, 503–504.

Ostlund, S. B., & Balleine, B. W. (2007). Selectivereinstatement of instrumental performance dependson the discriminative stimulus properties of themediating outcome. Learning and Behavior, 35, 43–52.

Palya, W. L., & Zacny, J. P. (1980). Stereotyped adjunctivepecking by caged pigeons. Animal Learning andBehavior, 8, 293–303.

Premack, D. (1963). Rate differential reinforcement inmonkey manipulation. Journal of the ExperimentalAnalysis of Behavior, 6, 81–89.

Premack, D. (1965). Reinforcement theory. In D. Levine(Ed.), Nebraska symposium on motivation (pp. 123–180).Lincoln: University of Nebraska Press.

Premack, D. (1971). Catching up with common sense, ortwo sides of a generalization: reinforcement andpunishment. In R. Glaser (Ed.), The nature ofreinforcement (pp. 121–150). New York: AcademicPress.

Rachlin, H., Battalio, R. C., Kagel, J. H., & Green, L.(1981). Maximization theory in behavioral psycholo-gy. Behavioral and Brain Sciences, 4, 371–417.

Rachlin, H., & Burkhard, B. (1978). The temporal triangle:Response substitution in instrumental conditioning.Psychological Review, 85, 22–47.

Rachlin, H., Green, L., Kagel, J. H., & Battalio, R. C.(1976). Economic demand theory and psychologicalstudies of choice. Psychology of Learning and Motivation,10, 129–154.

Rachlin, H., & Herrnstein, R. J. (1969). Hedonismrevisited: On the negative law of effect. In B. Camp-bell, & R. M. Church (Eds.), Punishment and aversivebehavior (pp. 83–109). New York: Appleton-Century-Crofts.

Reid, A. K., Bacha, G., & Moran, C. (1993). The temporalorganization of behavior on periodic food schedules.Journal of the Experimental Analysis of Behavior, 59, 1–27.

Reid, R. L. (1958). The role of the reinforcer as a stimulus.British Journal of Psychology, 49, 202–209.

Rescorla, R. A. (1968). Probability of shock in the presenceand absence of CS in fear conditioning. Journal ofComparative and Physiological Psychology, 66, 1–5.

Rescorla, R. A. (1988). Pavlovian conditioning: It’s notwhat you think it is. American Psychologist, 43, 151–160.

Richerson, P. J., & Boyd, R. (2005). Not by genes alone: Howculture transformed human evolution. Chicago: Universityof Chicago Press.

Rodewald, A. M., Hughes, C. E., & Pitts, R. C. (2010).Development and maintenance of choice in a

RETHINKING REINFORCEMENT 123

Page 24: RETHINKING REINFORCEMENT: ALLOCATION, INDUCTION, AND CONTINGENCY

dynamic environment. Journal of the ExperimentalAnalysis of Behavior, 94, 175–195.

Roper, T. J. (1973). Nesting material as a reinforcer forfemale mice. Animal Behaviour, 21, 733–740.

Roper, T. J. (1978). Diversity and substitutability ofadjunctive activities under fixed-interval schedules offood reinforcement. Journal of the Experimental Analysisof Behavior, 30, 83–96.

Sanabria, F., Sitomer, M. T., & Killeen, P. R. (2006). Negativeautomaintenance omission training is effective. Journalof the Experimental Analysis of Behavior, 86, 1–10.

Segal, E. F. (1972). Induction and the provenance of operants.In R. M. Gilbert, & J. R. Millenson (Eds.), Reinforcement:Behavioral Analyses (pp. 1–34). New York: Academic.

Seligman, M. E. P. (1970). On the generality of the laws oflearning. Psychological Review, 77, 406–418.

Shapiro, M. M. (1961). Salivary conditioning in dogsduring fixed-interval reinforcement contingent onlever pressing. Journal of the Experimental Analysis ofBehavior, 4, 361–364.

Sidman, M. (1953). Two temporal parameters of themaintenance of avoidance behavior by the white rat.Journal of Comparative and Physiological Psychology, 46,253–261.

Sidman, M. (1966). Avoidance behavior. In W. K. Honig(Ed.), Operant behavior: Areas of research and application(pp. 448–498). New York: Appleton-Century-Crofts.

Skinner, B. F. (1938). Behavior of organisms. New York:Appleton-Century-Crofts.

Skinner, B. F. (1948). ‘‘Superstition’’ in the pigeon.Journal of Experimental Psychology, 38, 168–172.

Skinner, B. F. (1950). Are theories of learning necessary?Psychological Review, 57, 193–216.

Skinner, B. F. (1953). Science and human behavior. New York:Macmillan.

Staddon, J. E. R. (1973). On the notion of cause, withapplications to behaviorism. Behaviorism, 1(2), 25–63.

Staddon, J. E. R. (1977). Schedule-induced behavior. In W.K. Honig, & J. E. R. Staddon (Eds.), Handbook ofoperant behavior (pp. 125–152). Englewood Cliffs, NJ:Prentice-Hall.

Staddon, J. E. R. (1983). Adaptive behavior and learning.Cambridge: Cambridge University Press.

Staddon, J. E. R., & Simmelhag, V. (1971). The ‘‘supersti-tion’’ experiment: A re-examination of its implica-

tions for the principles of adaptive behavior. Psycho-logical Review, 78, 3–43.

Thomas, G. (1981). Contiguity, reinforcement rate andthe law of effect. Quarterly Journal of ExperimentalPsychology, 33B, 33–43.

Thorndike, E. L. (2000/1911). Animal intelligence. NewBrunswick, NJ: Transaction Publishers.

Timberlake, W., & Allison, J. (1974). Response depriva-tion: An empirical approach to instrumental perfor-mance. Psychological Review, 81, 146–164.

Timberlake, W., & Lucas, G. A. (1989). Behavior systemsand learning: From misbehavior to general principles.In S. B. Klein, & R. R. Mowrer (Eds.), Contemporarylearning theories: Instrumental conditioning theory and theimpact of biological constraints on learning (pp. 237–275).Hillsdale, NJ: Erlbaum.

Trapold, M. A., & Overmier, J. B. (1972). The secondlearning process in instrumental learning. In A. H.Black, & W. F. Prokasy (Eds.), Classical conditioning. II.Current research and theory (pp. 427–452). New York:Appleton-Century-Crofts.

Urcuioli, P. J. (2005). Behavioral and associative effects ofdifferent outcomes in discrimination learning. Learn-ing and Behavior, 33, 1–21.

VanSyckel, J. S. (1983). Variable-interval and variable-ratiofeedback functions. Unpublished master’s thesis,University of New Hampshire.

Vaughan, W. J., & Miller, H. L. J. (1984). Optimizationversus response-strength accounts of behavior. Jour-nal of the Experimental Analysis of Behavior, 42,337–348.

Watson, J. B. (1930). Behaviorism. New York: W. W. Norton.Williams, D. R., & Williams, H. (1969). Automaintenance

in the pigeon: Sustained pecking despite contingentnonreinforcement. Journal of the Experimental Analysisof Behavior, 12, 511–520.

Zener, K. (1937). The significance of behavior accompa-nying conditioned salivary secretion for theories ofthe conditioned response. The American Journal ofPsychology, 50, 384–403.

Received: January 7, 2011Final Acceptance: September 6, 2011

124 WILLIAM M. BAUM