donahoe. palmer, & burgos (1997)

Upload: jasgdleste

Post on 01-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    1/19

    193

    JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIO R 1997, 67, 193211 NUMBER 2 (MARCH)

    THE S-R ISSUE: ITS STATUS IN BEHAVIOR ANALYSIS AND IN DONAHOE AND PALMERS

    LEARNING AND COMPLEX BEHAVIOR

    JOH N W. DONAHOE , D AV ID C. P AL ME R , AN D JOS E E. BURGOS

    UNIVERSITY OF MASSACHUSETTS AT AMHERST, SMITH COLLEGE, AN D UN IV ER SI DA D CE NT RA L DE VE NE ZU EL A AN D UN IV ER SI DA D CAT SL IC A DE VE NE ZU EL A

    The central focus of this essay is whether the effect of reinforcement is best viewed as the strength-ening of responding or the strengthening of the environmental control of responding. We make theargument that adherence to Skinners goal of achieving a moment-to-moment analysis of behaviorcompels acceptance of the latter view. Moreover, a thoroughgoing commitment to a moment-to-moment analysis undermines the fundamental distinction between the conditioning processes in-stantiated by operant and respondent contingencies while buttressing the crucially important differ-ences in their cumulative outcomes. Computer simulations informed by experimental analyses of behavior and neuroscience are used to illustrate these points.

    Key words: S-R psychology, contingencies of reinforcement, contiguity, discrimination learning, re-inforcement, respondent conditioning, computer simulation

    Richard Shulls thoughtful review (Shull,1995) of Donahoe and Palmers Learning and Complex Behavior (1994) (hereafter, LCB )prompted this essay. The review accurately summarized the general themes that in-formed our efforts and, more to the point forpresent purposes, identied an important is-suehere called the stimulusresponse (S-R)issuethat was not directly addressed in our work. Clarifying the status of the S-R issue isimportant for the further development of be-havior analysis, and we seek to make explicit some of the fundamental concerns that sur-round the issue, most particularly as they arise in LCB.

    The simulation research reported here was supportedin part by a faculty research grant from the GraduateSchool of the University of Massachusetts at Amherst anda grant from the National Science Foundation, BNS-8409948. The authors thank John J. B. Ayres and VivianDorsel for commenting on an earlier version of themanuscript. The authors express their special apprecia-tion to two reviewers who have chosen to remain anon- ymous; they, at least, should know of our appreciation fortheir many contributions to the essay.

    Correspondence and requests for reprints may be ad-dressed to John W. Donahoe, Department of Psychology,Program in Neuroscience and Behavior, University of Massachusetts, Amherst, Massachusetts 01003 (E-mail: [email protected]); David C. Palmer, Depart-ment of Psychology, Clark Science Center, Smith College,Northampton, Massachusetts 01063 (E-mail: [email protected]); or Jose E. Burgos, Consejo deEstudios de Post-grado, Facultad de Humanidades y Ed-ucacion, Universidad Central de Venezuela (UCV), Ca-racas, Venezuela (E-mail: [email protected]).

    To provide a context in which to considerthe S-R issue, it is helpful to summarize brief-ly the central themes of the book: (a) Behav-ior analysis is an independent selectionist sci-ence that has a fundamental conceptualkinship with other historical sciences, notably evolutionary biology. (b) Complex behavior,including human behavior, is best under-stood as the cumulative product of the actionover time of relatively simple biobehavioralprocesses, especially selection by reinforce-ment. (c) These fundamental processes arecharacterized through experimental analysesof behavior and, if subbehavioral processesare to be included, of neuroscience. (Thiscontrasts with normative psychology in whichsubbehavioral processes are inferred fromthe very behavior they seek to explain, there-by inviting circular reasoning.) (d) Complexhuman behavior typically occurs under cir-cumstances that preclude experimental anal- ysis. In such cases, understanding is achievedthrough scientic interpretations that areconstrained by experimental analyses of be-havior and neuroscience. The most compel-ling interpretations promise to be those that

    trace the cumulative effects of reinforcement through formal techniques, such as adaptiveneural networks, as a supplement to purely verbal accounts.

    It is in the section of the review entitledPrinciple of Selection (Reinforcement)(Shull, 1995, p. 353) that the S-R issue is

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    2/19

    194 JOHN W. DONAHOE et al.

    raised. The following statement in LCB is cit-ed:

    The outcome of selection by reinforcement isa change in the environmental guidance of behavior. That is, what is selected is always an

    environmentbehavior relation, never a re-sponse alone. ( LCB , p. 68)

    Of this statement, Shull comments,In this respect, then, [ LCB s] conception of reinforcement is very much in the tradition of S-R theory . . . [in which] . . . what was selected was the ability of a particular stimulus patternto evoke a particular response pattern. (Shull,1995, p. 353)

    The question is then considered of whetherthis view is consistent with the behavior-ana-lytic conception of operant behavior in whichoperant behavior occurs in a stimulus con-

    text, but there is often no identiable stimu-lus change that precedes each occurrence of the response (Shull, 1995, p. 354). Thisleads to the related concern of whether adap-tive neural networks are suitable to interpret operant behavior because networks are con-structed from elementary connections in-tended as analogues of stimulusresponse re-lations (Shull, 1995, p. 354).

    In what follows, we seek to demonstrate not only that LCB s view of operant behavior andits interpretation via adaptive neural networksis consistent with behavior-analytic formula-tions (which we share), but also that this view

    enriches our understanding of what it meansto say that operants are emitted rather thanelicited. We agree that the behavior-analytic view of operants should be regarded as lib-erating because . . . fundamental relation-ships could be established in procedures that allowed responses to occur repeatedly overlong periods of time without the constraintsof trial onset and offset (Shull, 1995, p.354). Instead of departing from behavior-an-alytic thinking, the view that reinforcers select environmentbehavior relations fosters moreparsimonious treatments of stimulus controland conditioning, and represents a continu-

    ation of Skinners efforts to provide a com-pelling moment-to-moment account of be-havior (Skinner, 1976).

    Futher, we concentrate on the rationale be-hind this view of selection by reinforcement as it is interpreted by biobehaviorally con-strained neural networks. Because material

    relevant to the S-R issue is scattered through-out the book and some of the more technicaldetails are not elaborated, the need for clar-ication is understandable. We consider rst the interpretation of responding in a stablestimulus context and then proceed to a moregeneral examination of the core of the S-R issue. No effort is made to discuss all of itsramicationsthe phrase connotes a consid-erable set of interrelated distinctions that vary somewhat among different theorists (cf.Lieberman, 1993, p. 190; B. Williams, 1986;Zuriff, 1985). Also, no effort is made to pro- vide a historical overview of the S-R issue, al-though such information is clearly requiredfor a complete treatment of the topic (seeColeman, 1981, 1984; Dinsmoor, 1995; Gor-mezano & Kehoe, 1981).

    BEHAVING IN A STABLE CONTEXT

    The central distinction between S-R psy-chology and the view introduced by Skinneris how one accounts for variability in behav-ior. The dening feature of S-R psychology isthat it explains variability in behavior by ref-erence to variability in antecedents: When aresponse occurs there must have been somediscrete antecedent, or complex of antece-dents, overt or covert, that evoked the re-sponse. If the response varies in frequency, it is because antecedent events have varied in

    frequency. On this view, there will always bea nonzero correlation between antecedent events and behavior. Further, frequency of re-sponse (or frequency per unit time, i.e., rate)cannot serve as a fundamental dependent variable because response rate is, at root, afunction of the rate of stimulus presentation.In contrast, Skinner held that, even whenthere is no identiable variability in antece-dents, variability in behavior remains lawful:Behavior undergoes orderly change becauseof its consequences. In fact, at the level of behavioral observations, one can nd lawfulrelationships between the occurrence of a re-

    sponse and the contingencies of reinforce-ment in a stable context. Skinner did not merely assert the central role of control by consequences; he persuasively demonstratedit experimentally. Once such control is ac-cepted as an empirical fact and not simply asa theoretical preference, the S-R position be-

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    3/19

    195S-R ISSUE

    Fig. 1. The simulation by a neural network of acquisition (ACQ), extinction (EXT), and reacquisition (REACQ) with an operant contingency. The simulated environmental context activated the input units of the neural networkat a constant level of 1 throughout all phases of the simulation. In accordance with an operant contingency, theinput unit for the reinforcing stimulus was activated during ACQ and REACQ only when the activation level of theoutput unit simulating the operant (R) was greater than zero. During EXT, the input unit for the reinforcing stimulus was never activated. (Activation levels of units could vary between 0 and 1.) The activation level of the output unit simulating the conditioned response (CR), which also changed during the conditioning process, is also shown.

    comes untenable. We also accept control by consequences as an empirical fact, and ournetworks simulate some of its orderly effects without appealing to correlated antecedent changes in the environment.

    Consider the neural network simulation of the reacquisition of an extinguished responsethat is discussed in LCB (pp. 9295). In therst phase of the simulation a response wasfollowed by a reinforcer, in the second phaseextinction was scheduled for the response,

    and in the third phase the response was againreinforced. The sensory inputs to the net- work were held constant throughout the sim-ulation. (Note that in a simulation the stim-ulus context may be held strictly constant,unaffected by moment-to-moment variationsin stimulation that inevitably occur in actual

    experiments.) In the simulation, the strengthof the response varied widely even though thecontext remained constant: Responding in-creased in strength during acquisition, weak-ened during extinction, and then rapidly in-creased again during reacquisition, and didso more rapidly than during original acqui-sition (see Figure 1). Moreover, the changesin response strength were not monotonic, but showed irregularities during the transitions inresponse strength. None of these changes can

    be understood by reference to the stimuluscontext; it remained constant throughout thesimulation. Instead, the changes can only beinterpreted by reference to the effects of thecontingencies of reinforcement on the net- work and to the history of reinforcement inthat context.

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    4/19

    196 JOHN W. DONAHOE et al.

    BEHAVIORAL ANDNEURAL LEVELS

    OF ANALYSIS

    How do we square the foregoing account with the claim that what is selected is alwaysan environmentbehavior relation, never aresponse alone ( LCB, p. 68)? The apparent incongruity arises from a confusion of levelsof analysis. We have attempted to uncover re-lationships between two independent sci-ences: behavior analysis and neuroscience.Specically, we have made use of physiologi-cal mechanisms that we believe are consistent with behavioral laws. S-R psychology and rad-ical behaviorism are both paradigms of a sci-ence of behavior ; neither includes the under-lying physiology in its purview. In a stablecontext, control by consequences (as op-posed to antecedents) stands as a behaviorallaw, but we propose (at another level of anal- ysis) that the effects of those consequencesare implemented by changes in synaptic ef-cacies. This idea is not new, of course; Wat-son thought as much (Watson, 1924, p. 209).

    Consider how the network accomplishesthe simulation discussed above: Changes inthe strength of a response occur because of changes in the strengths of connections (sim-ulating changes in synaptic efcacies) alongpathways from upstream elements. That is,there are changing gradients of control by the constant context as a function of the con-

    tingencies of reinforcement. From this per-spective, variation in behavior is due to vary-ing consequences, but antecedent events arenecessary for the behavior to occur. It is thislatter feature of our proposal that encouragesthe misperception that we are endorsing S-R psychology, because the strength with whichan operant unit is activated depends (amongother things) on the activation of the inputsof the network by the simulated environment.However, the distinction between S-R psy-chology and behavior analysis is at the levelof behavior, not at the level of biologicalmechanism. Our networks are intended to

    simulate aspects of free-operant behavior ex-hibited by an organism in an experimentalchamber and the functioning of the network(i.e., its inputoutput relations) in conformity with behavioral laws. Thus, we argue that theeffects of the consequences of a response areinuenced by the context. The analogy with

    a black box is exact: We can eliminate theorganism as a variable in our functional re-lationships, not because the organism is un-necessary, but because it can be ignored inlaws of behavior; we treat it as given and as aconstant, not as a variable. Similarly, when thecontext is held constant, it, too, can be ig-nored, but this does not mean that the con-text is unnecessary any more than the organ-ism is unnecessary.

    In discrimination procedures the context reemerges in our behavioral laws, because it is now a variable. There is a difference be-tween claiming that control by context neednot be considered in some situations andclaiming that control by context does not ex-ist in those situations. Indeed, Skinner tookthe rst position, not the second (Skinner,1937). Consider the following: In our simu-

    lation of reacquisition, the response gainedstrength after fewer reinforcers than duringoriginal learning because some of the effectsof prior reinforcers on the strength of con-nections within the network had not beencompletely undone by the intervening periodof extinction. The constancy of the context during acquisition and reacquisition played acrucial role in this result because the endur-ing context permitted some of the same path- ways to be activated during both acquisitionand reacquisition (cf. LCB , p. 94; Kehoe,1988). With the simulation, as with a livingorganism, context sets the occasion for re-

    sponding, although its inuence may not beapparent until the context is changed, in which case generalization decrement issaid to occur. This necessarily implies controlby context.

    One can interpret observations at the phys-iological level in ways that more transparently parallel behavioral laws than the accounts wehave offered. For example, consider an im-portant nding from Steins group (e.g.,Stein & Belluzzi, 1988, 1989; Stein, Xue, &Belluzzi, 1993, 1994) that is described in LCB (p. 56). It was found that the frequency of ring of a neuron could be increased by in-

    troducing a neuromodulator, such as dopa-mine, into the synapse following a burst of ring. These ndings have been interpretedto mean that neuromodulators increase thebursting activity of neurons in a manner anal-ogous to the strengthening of emitted behav-ior by contingent reinforcers. An alternative

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    5/19

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    6/19

    198 JOHN W. DONAHOE et al.

    whether such regularities are sui generis (i.e.,understandable only at the level at which they appear). Skinner clearly favored moment-to-moment analyses (e.g., Ferster & Skinner,1957). Consider the following statements inFarewell, my lovely! in which Skinner(1976) poignantly lamented the decline of cumulative records in the pages of this jour-nal.

    What has happened to experiments whererate changed from moment to moment in in-teresting ways, where a cumulative record toldmore in a glance than could be described ina page? . . . [Such records] . . . suggested a re-ally extraordinary degree of control over anindividual organism as it lived its life from mo-ment to moment. . . . These molecularchanges in probability of responding are most immediately relevant to our own daily lives.(Skinner, 1976, p. 218)

    Skinners unwavering commitment to a mo-ment-to-moment analysis of behavior (cf.Skinner, 1983, p. 73) has profoundand un-derappreciatedimplications for the resolu-tion of the S-R issue as well as for other cen-tral distinctions in behavior analysis,including the distinction between operant and respondent conditioning itself.

    Stimulus Control of Behavior In LCB , an organism is described as im-

    mersed in a continuous succession of envi-ronmental stimuli . . . in whose presence a

    continuous succession of responses . . . is oc-curring. . . . When a [reinforcing] stimulus isintroduced into this stream of events, then. . . selection occurs (cf. Schoenfeld & Farm-er, 1970) (p. 49). At the moment when thereinforcer occurswhat Skinner more casu-ally referred to as the moment of Truthsome stimulus necessarily precedes the rein-forced response in both differential andnondifferential conditioning. That is, at themoment of reinforcement (Ferster & Skin-ner, 1957, pp. 23), there is no environmen-tal basis by which to distinguish between thetwo contingencies. Therefore, no basis exists

    by which different processes could be initiat-ed for nondifferential as contrasted with dif-ferential conditioning (i.e., response strength-ening in the rst instance and stimuluscontrol of strengthening in the second). If control by contextual stimuli does not occurin nondifferential conditioning, then discrim-

    ination becomes an anomaly and requires adhoc principles that differ from those that ac-commodate nondifferential conditioning. Insuch a formulation, the environment wouldbecome empowered to control behavior when there were differential consequences,but not otherwise. But, is it credible that re-inforcers should strengthen behavior relativeto a stimulus with one procedure and not with the other? And, if so, what events present at the moment of reinforcement are avail-able to differentiate a reinforced response ina discrimination procedure from a reinforcedresponse in a nondiscrimination procedure?The conclusion that no such events exist ledDinsmoor (1995, p. 52) to make much thesame point in citing Skinners statement that it is the nature of [operant] behavior that . . . discriminative stimuli are practically in-

    evitable (Skinner, 1937, p. 273; see also Ca-tania & Keller, 1981, p. 163).During differential operant conditioning,

    stimuli are sensed in whose presence a re-sponse is followed by a reinforcer. But envi-ronmentbehaviorreinforcer sequences nec-essarily occur in a nondiscriminationprocedure as well. The two procedures differ with respect to the reliability with which par-ticular stimuli are present prior to the rein-forced response, but that difference cannot be appreciated on a single occasion. The es-sence of reliability is repeatability. The dis-tinction emerges as a cumulative product of

    the occurrence of reinforcers over repeatedindividual occasions. In laboratory proce-dures that implement nondifferential condi-tioning, it is not that no stimuli are sensedprior to the responsereinforcer sequence,but that no stimuli speciable by the experi-menter are reliably sensed prior to the se-quence.

    Conditioning of Behavior Paradoxically, by strictly parallel reasoning,

    an acceptance of Skinners commitment to amoment-to-moment analysis of behavior com-pels a rejection of a fundamental distinction

    between the conditioning processes instan-tiated by respondent and operant proce-dures. Instead, a moment-to-moment analysiscalls for a unied theoretical treatment of theconditioning process, with the environmentalcontrol of responding as the cumulative out-come of both procedures.

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    7/19

    199S-R ISSUE

    If an organism is continuously immersed inan environment and is continuously behavingin that environment, then both stimulus andresponse events necessarily precede and,hence, are potentially affected by the occur-rence of a reinforcer regardless of the contin-gency according to which the reinforcer oc-curs. In a respondent procedure a speciedstimulus, the conditioned stimulus (CS) oc-curs before the unconditioned stimulus (US).The CS is likely to become a constituent of the selected environmentbehavior relationbecause of its temporal relation to the US.The behavioral constituent of the selected re-lation includes the response elicited by theUS, the unconditioned response (UR). How-ever, because organisms are always behaving,other responses may also precede the US(e.g., orienting responses to the CS; Holland,

    1977), although these responses may vary somewhat from moment to moment. As anexample of a respondent procedure, if a toneprecedes the introduction of food into themouth, then the tone may continue to guideturning the head toward the source of thetone and come to guide salivating elicited by food. In the operant procedure, the contin-gency ensures that a specic behaviortheoperantoccurs before the reinforcer. Be-cause of its proximity to the reinforcer, theoperant is then also likely to become a part of the selected environmentbehavior rela-tion. However, because behavior always takes

    place in an environment, some stimulus must precede the reinforcer although the particu-lar stimulus may vary from moment to mo-ment. For example, a rat may see or touchthe lever prior to pressing it and receivingfood. From this perspective, respondent andoperant conditioning are two different pro-cedural arrangements (i.e., contingencies)that differ with respect to the environmentaland behavioral events that are reliably contig-uous with the reinforcer. But, this proceduraldifference need not imply different condi-tioning processes ( LCB , pp. 4950; cf. Dona-hoe, Burgos, & Palmer, 1993; Donahoe,

    Crowley, Millard, & Stickney, 1982, pp. 1923).The view that reinforcers select environ-

    mentbehavior relations whatever the proce-dure and that various procedures differamong themselves in the stimuli and re-sponses that are likely to be present at the

    moment of selection is consistent with centralaspects of Skinners thinking. As noted inLCB,

    Although Skinners treatment of respondent and operant conditioning emphasized the dif-

    ferences between the two procedures andtheir outcomes, the present treatment is con-sistent with his emphasis on the ubiquity of what he called the three-term contingency(Skinner, 1938, 1953). That is, the reinforce-ment process always involves three elementsa stimulus, a response, and a reinforcer. Thereis nothing in a unied treatment of classicaland operant conditioning that minimizes thecrucially important differences between theoutcomes of the two procedures for the inter-pretation of complex behavior. However, aunied principle does deeply question the view that classical and operant proceduresproduce two different kinds of learning orrequire fundamentally different theoreticaltreatments. Both procedures select environ-mentbehavior relations but, because of the dif- ferences in the events that reliably occur in the vi- cinity of the reinforcer, the constituents of the selected relations are different . (LCB , p. 65, emphasisadded)

    Acknowledging that the organism is alwaysbehaving in the presence of some environ-ment renes the conceptual treatment of re-spondents and operants by grounding thedistinction on the reliability with which spe-cic stimulus and response events are affect-ed by the two contingencies (cf. Palmer &Donahoe, 1992). On a single occasion, thereis no basis by which to distinguish a respon-dent from an operant procedure (cf. Hilgard& Marquis, 1940; Hineline, 1986, p. 63). Oth-ers, such as Catania, have appreciated thispoint:

    It is not clear what differential contingenciescould be the basis for discrimination of thecontingencies themselves. If we argue that some properties of the contingencies must belearned, to what contingencies can we appealas the basis for that learning? (Catania & Kel-ler, 1981, p. 163)

    The difference in procedures produces cru-

    cial differences in their ultimate outcomes,but those different outcomes emerge cumu-latively over successive iterations of the samereinforcement process acting in accordance with the specic contiguities instantiated by the procedures. A commitment to a moment-to-moment analysis unavoidably commits one

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    8/19

    200 JOHN W. DONAHOE et al.

    to the view that reinforcers select environ-mentbehavior relations, not behavior alone. At the moment of Truthwhether in a re-spondent or an operant procedure or in adiscrimination or nondiscrimination proce-durethe reinforcing stimulus accompaniesboth environmental and behavioral events.Hence, even if fundamentally different con-ditioning processes existed for the variousprocedures, there would be no environmen-tal basis by which one or the other could beappropriately invoked (cf. Donahoe et al.,1982, 1993, pp. 2122).

    In short, we have been misled into search-ing for different processes to account for re-spondent and operant conditioning and fornondifferential and differential conditioning,as well as for more complex discriminationprocedures (cf. Sidman, 1986), by the lan-

    guage of contingency. Contingency, as the termis conventionally used in behavior analysis, re-fers to relations between events that are de-ned over repeated instances of the constit-uent events. We describe our experimentalprocedures in terms of the manipulation of contingencies, but, by changing the contin-gencies, we change the contiguities. In oursearch for the controlling variables, we haveconfused the experimenters description of the contingencies with the organisms contact with the contiguities instantiated by thosecontingencies. And, of course, it is the organ-isms contact with events, not the experimen-

    ters description of them, that must be thebasis for selection by reinforcement.Contingency is the language of procedure;

    contiguity is the language of process. We havenot thoroughly researched Skinners use of the term contingency, but he employed it, at least sometimes, in a manner that is synony-mous with contiguity. For example, there ap-pears to be no way of preventing the acqui-sition of non-advantageous behavior throughaccident. . . . It is only because organismshave reached the point at which a single con-tingency makes a substantial change that they are vulnerable to coincidences (Skinner,

    1953, pp. 8687, emphasis added; cf. Catania& Keller, 1981, p. 128). (The meaning of con-tingency as a coincidental relation betweenevents is, in fact, the primary meaning inmany dictionaries, although in behavior anal- ysis it much more often denotes reliable re-lations.)

    Relation of Momentary Processes to Molar Regularities

    Skinner was resolutely committed to a mo-ment-to-moment account at the behaviorallevel of analysis, although he did not acknowl-

    edge that this view would call for a reassess-ment of the conceptual distinction betweenoperant and respondent conditioning (but not the crucial differences between these pro-cedures and their corresponding outcomes).His early adherence to a moment-to-moment analysis is apparent in the experimental ob-servation that, under properly controlled cir-cumstances, even a single occurrence of alever press followed by food changes behavior(Skinner, 1938). Skinners discussions of su-perstitious conditioning echo the sametheme: Momentary temporal relations may promote conditioning (see also Pear, 1985;Skinner, 1953, pp. 8687; for alternative in-terpretations, cf. Staddon & Simmelhag,1971; Timberlake & Lucas, 1985):

    A stimulus present when a response isreinforced may acquire discriminative controlover the response even though its presence at reinforcement is adventitious. (Morse & Skin-ner, 1957, p. 308)

    And,to say that a reinforcement is contingent upona response may mean nothing more than that it follows the response. . . . conditioning takesplace because of the temporal relation only,

    expressed in terms of the order and proximity of response and reinforcement. (Skinner,1948, p. 168)

    The centrality of momentary temporal rela-tions has also been afrmed by students of respondent conditioning. Gormezano andKehoe, speaking within the associationist tra-dition, state,

    A single instance of contiguity between A andB may establish an association, repeated in-stances of contiguity were necessary to estab-lish a cause-effect relation. (p. 3) Any relationship of pairing or correlationcan be seen to be an abstraction of the record.(Gormezano & Kehoe, 1981, p. 31)

    Moment-to-moment accounts of the con-ditioning process are also consistent with ob-servations at the neural level. For example,Steins work indicates that the reinforcing ef-fect of the neuromodulator dopamine occurs

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    9/19

    201S-R ISSUE

    only when it is introduced into the synapse within 200 ms of a burst of ring in the post-synaptic neuron (Stein & Belluzzi, 1989). Be-havior analysis and neuroscience are inde-pendent disciplines, but their principlescannot be inconsistent with one anothersndings. The two sciences are dealing withdifferent aspects of the same organism ( LCB ,pp. 275277; Skinner, 1938).

    Although conditioning processes are in-stantiated in moment-to-moment relationsbetween events, compelling regularitiessometimes appear in the relation between in-dependent and dependent variables denedover more extended periods of time (e.g., be-tween average rate of reinforcement and av-erage rate of responding; Baum, 1973; Herrn-stein, 1970). What is the place of molarregularities in a science if its fundamental

    processes operate on a moment-to-moment basis? Nevins answer to this question seems very much on the mark: The possibility that molar relations . . . may prove to be derivativefrom more local processes does nothing todiminish their value as ways to summarizeand integrate data (Nevin, 1984, p. 431; seealso Herrnstein, 1970, p. 253). The concep-tual relation between moment-to-moment processes and molar regularities in behavioranalysis parallels the distinction between se-lection for and selection of in the paradig-matic selectionist science of evolutionary bi-ology (Sober, 1984). Insofar as the notions of

    cause and effect have meaning in the context of the complex interchange between an or-ganism and its environment: Selection fordescribes the causes, while selection of de-scribes the effects (Sober, 1993, p. 82). Inevolutionary biology, selection for genes af-fecting reproductive tness leads to selectionof altruistic behavior (Hamilton, 1964). As thedistinction applies in behavior analysis, rein-forcers cause certain environmentbehaviorrelations to be strengthened; this has the ef-fect, under some circumstances, of producingmolar regularities. Selection by reinforce-ment for momentary environmentbehavior

    relations produces selection of molar regular-ities.One can demonstrate that what reinforcers

    select are momentary relations between en- vironmental and behavioral events, not themolar regularities that are their cumulativeproducts. This can be done by arranging con-

    tingencies of reinforcement that pit moment-to-moment processes against molar regulari-ties. Under these circumstances, the variationin behavior typically tracks moment-to-mo-ment relations, not relations between eventsdened over more extended periods of time.For example, with positive reinforcers, differ-ential reinforcement of responses that occurat different times following the previous re-sponse (i.e., differential reinforcement of in-terresponse times, or IRTs) changes the over-all rate of responding even though the overallrate of reinforcement is unchanged (Platt,1979). As conjectured by Shimp (1974, p.498), there may be no such thing as an as- ymptotic mean rate of [responding] that is. . . independent of reinforced IRTs (cf. An-ger, 1956). Similarly, in avoidance learning, when the delay between the response and

    shock is varied but the overall rate of shockis held constant, the rate of avoidance re-sponding is sensitive to the momentary delay between the response and shock, not theoverall rate of shock (Hineline, 1970; see alsoBenedict, 1975; Bolles & Popp, 1964). Re-search with respondent procedures has led inthe same direction: Molar regularities are thecumulative products of moment-to-moment relations. For example, whereas at one timeit was held that behavior was sensitive to theoverall correlation between conditioned andunconditioned stimuli (Rescorla, 1967), laterexperiments (Ayres, Benedict, & Witcher,

    1975; Benedict & Ayres, 1972; Keller, Ayres,& Mahoney, 1977; cf. Quinsey, 1971) and the-oretical work (Rescorla & Wagner, 1972)demonstrated that molar regularities couldbe understood as the cumulative products of molecular relations between CS and US. Insummary, research with both operant and re-spondent procedures has increasingly shownthat molar regularities are the cumulativeproducts of moment-to-moment conditioningprocesses. (For initial work of this sort, seeNeuringer, 1967, and Shimp, 1966, 1969,1974. For more recent efforts, see Herrn-stein, 1982; Herrnstein & Vaughan, 1980;

    Hinson & Staddon, 1983a, 1983b; Moore,1984; Silberberg, Hamilton, Ziriax, & Casey,1978; Silberberg & Ziriax, 1982.)

    It must be acknowledged, however, that not all molar regularities can yet be understoodas products of molecular processes (e.g., be-havior maintained by some schedules or by

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    10/19

    202 JOHN W. DONAHOE et al.

    long reinforcer delays; Heyman, 1979; Hine-line, 1981; Lattal & Gleeson, 1990; Nevin,1969; B. Williams, 1985). Refractory ndingscontinue to challenge moment-to-moment accounts, and a completely integrated theo-

    retical treatment of molar regularities interms of molecular processes still eludes us(cf. B. Williams, 1990). Difculties in provid-ing moment-to-moment accounts of molarregularities in complex situations are not pe-culiar to behavior analysis. Physics continuesto struggle with many-body problems in me-chanics, even though all of the relevant fun-damental processes are presumably known.Nevertheless, it is now clear that behavioranalysis is not forced to choose between mo-lar and moment-to-moment accounts (e.g.,Meazzini & Ricci, 1986, p. 37). The two ac-counts are not inconsistent if the former are

    regarded as the cumulative product of the lat-ter.Indeed, the two accounts may be even

    more intimately intertwined: In the evolu-tionary history of organisms, natural selec-tion may have favored genes whose expres-sion yielded moment-to-moment processesthat implemented certain molar regularitiesas their cumulative product ( LCB , pp. 112114; Donahoe, in press-b; cf. Skinner, 1983,p. 362; Staddon & Hinson, 1983). Natural se-lection for some molar regularity (e.g., maxi-mizing, optimizing, matching) may have ledto selection of moment-to-moment processes whose product was the molar regularity. Inthat way, natural selection for the molar reg-ularity could lead to selection of momentary processes. Once those moment-to-moment processes had been naturally selected, selec-tion by reinforcement for momentary envi-ronmentbehavior relations could, in turn,cause selection of the molar regularity. Note,however, to formulate the reinforcement pro-cess in terms of the molar regularities it pro-duces, rather than the moment-to-moment processes that implement it, is to conate nat-ural selection with selection by reinforce-

    ment. The selecting effect of the temporally extended environment is the province of nat-ural selection; that of the moment-to-moment environment is the province of selection by reinforcement. Of course, many momentary environments make up the temporally ex-tended environment, but selection by rein-

    forcement is for the former environments, whereas natural selection is for the latter.

    Additional experimental work is needed todetermine how moment-to-moment process-es may lead to molar regularities, but the ef-

    fort will undoubtedly also require interpre-tation (Donahoe & Palmer, 1989, 1994, pp.125129). In the nal section of this essay,interpretation by means of adaptive neuralnetworks is used to clarify the contribution of momentary processes to the central issue: theS-R issue.

    NEURAL NETWORK INTERPRETATIONS OF

    CONDITIONING

    We turn nally to the question of whetherbiobehaviorally constrained neural networkscan faithfully interpret salient aspects of thestimulus control of operants. The full answerto this question obviously lies in the future;however, preliminary results are encouraging(e.g., Donahoe et al., 1993; Donahoe & Dor-sel, in press; Donahoe & Palmer, 1994). Ourconcern here is whether, in principle, net- works constructed from elementary connec-tions that are said to be analogues of stim-ulusresponse relations can accommodatethe view that operant behavior occurs in astimulus context, but there is often no iden-tiable stimulus change that precedes eachoccurrence of the response (Shull, 1995, p.354). This view of operants is rightly regardedas liberating because it empowers the study of complex reinforcement contingencies inthe laboratory and because it frees appliedbehavior analysis from the need to identify the precise controlling stimuli for dysfunc-tional behavior before instituting remedial in-terventions. Indeed, it can be argued that pragmatic considerations motivated the op-erant-respondent distinction more than prin-cipled distinctions about the role of the en- vironment in emitted and elicited behavior.

    The present inquiry into neural network

    interpretations of operants can be separatedinto two parts: The rst, and narrower, ques-tion is: Do neural networks implement an-alogues of stimulusresponse relations? Thesecond is: Are neural networks capable of simulating the effects of nondifferential as well as differential operant contingencies?

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    11/19

    203S-R ISSUE

    Interpreting EnvironmentBehavior Relations

    A neural network consists of (a) a layer of input units whose activation levels simulatethe occurrence of environmental events, (b)one or more layers of hidden or interiorunits whose activation levels simulate thestates of interneurons, and (c) a layer of out-put units whose activation levels simulate theeffectors that produce behavioral events (cf.Donahoe & Palmer, 1989). If a stimulusre-sponse relation denotes a relation that is me-diated by direct connections going from in-put to output units, then such relations arenot, in general, characteristic of neural net- works. Although a simple network consistingof only such inputoutput connections (a so-called perceptron architecture; Rosenblatt,1962) can mediate a surprising range of in-putoutput relations, some relations that aredemonstrable in living organisms are beyondthe capabilities of these networks (Minsky &Papert, 1969). In contrast, networks with non-linear interior units, which more closely sim-ulate the networks of neurons in the nervoussystem, are typical of modern neural networkarchitectures. Such multilayered networkshave already demonstrated their ability to me-diate a substantial range of complex environ-mentbehavior relations that are observed with living organisms (e.g., Kehoe, 1988,1989; cf. McClelland, Rumelhart, & the PDP

    Research Group, 1986; Rumelhart, Mc-Clelland, & the PDP Research Group, 1986).Thus, neither neuroscience nor neural net- work research endorses formulations in which stimuli guide behavior by means of di-rect connections akin to monosynaptic reex-es. (We would also note that not even tradi-tional S-R learning theoristse.g., Guthrie,1933; Hull, 1934, 1937; Osgood, 1953heldsuch a simple view of the means whereby theenvironment guided behavior. In many of their proposals, inferred processes, such asthe r g-sg mechanism, intervened between theenvironment and behavior.)

    The neural network research of potentialinterest to behavior analysts is distantly relat-ed to what was earlier called S-O-R psychology (where O stood for organism). However, ac-knowledging a role for the organism in no way endorses an autonomous contribution of the organism: All contributions must be

    traceable to the environment, that is, to his-tories of selection by the ancestral environ-ment as understood through natural selec-tion and by the individual environment asunderstood through selection by reinforce-ment. Also, to be congenial with behavioranalysis, all intraorganismic events must bethe product of independent biobehavioral re-search; they cannot be inferences from be-havior alone. For instance, the organismiccounterparts of hidden units are not merely inferences from a behavioral level of obser- vation but are observed entities from a neurallevel.

    In the case of our neural network research, when input units are stimulated by the sim-ulated occurrence of environmental stimuli,the interior units to which those input unitsare connected are probabilistically activated

    in the following moment. If a reinforcing sig-nal is present at that moment, then connec-tions are strengthened between input unitsand all recently activated interior units to which they are connected. The process of strengthening the connections between co-active pre- and postsynaptic units is carriedout simultaneously throughout the networkat each moment until the end of the simulat-ed time period. The activation levels of unitsdecay over time unless they were reactivatedduring the preceding moment. Simulationsin which the strengths of connections arechanged from moment to moment are

    known as real-time simulations, and thesuccessive moments at which the strengths of connections are changed (or updated) arecalled time steps. Stated more generally,real-time neural network simulations imple-ment a dynamical systems approach to the in-terpretation of behavior (cf. Galbicka, 1992).In a fully realized simulation, the simulatedprocesses that change the strengths of con-nections, or connection weights, and thedurations of time steps are tightly constrainedby independent experimental analyses of neuroscience and behavior (e.g., Buonoma-no & Merzenich, 1995) and, at a minimum,

    are consistent with what is known about suchprocesses. Skinners dictum (1931) that be-havior should be understood at the level at which orderly relations emerge applies withequal force to the neural level. Although con-nection weights are updated on a moment-to-moment basis, the functioning of a network

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    12/19

    204 JOHN W. DONAHOE et al.

    cannot be understood solely by reference tothe environment of the moment: Connection weights at any given moment are a functionof the entire selection history of the networkto that point. Networks, like organisms, arehistoric systems whose current performancecannot be understood by reference to the en- vironment of the moment alone (Staddon,1993; cf. Donahoe, 1993).

    Interpreting Behavior in Nondiscrimination Procedures

    Before we describe computer simulationsthat illustrate interpretations of the condi-tioning of operants, the view that some op-erants may be uninuenced by antecedent stimuli requires closer examination. Upon in-spection, experimental situations that meet the denition of a nondiscrimination proce-

    dure typically contain implicit three-termcontingencies. For example, consider a situ-ation in which a pigeon is presented with aresponse key of a constant green color andkey pecking is reinforced with food on someschedule of intermittent reinforcement. Be-cause no other conditions are manipulatedby the experimenter, the arrangement is ap-propriately described as a nondiscriminationprocedure. Note, however, that pecking ismore likely to be reinforced if the pigeonshead is oriented toward the green key than if it is oriented toward some other stimulus inthe situation; pigeons tend to look at what

    they peck (Jenkins & Sainesbury, 1969).Thus, the observing response of orienting to- ward the green key is reinforced as a com-ponent of a behavioral chain whose terminalresponse is pecking the green key. Statedmore generally, observing responses are of-ten implicitly differentially reinforced in non-discrimination procedures, and the stimuliproduced by such responses are thereforemore likely to be sensed prior to the rein-forced response. As a result, such stimulicome to control the response (Dinsmoor,1985; cf. Heinemann & Rudolph, 1963).

    Moreover, a schedule of reinforcement that

    is implemented in an environment in whichthe experimenter has not programmed a re-lation between features of the environment and the responsereinforcer contingency may nonetheless contain stimuli in whosepresence the reinforced response differen-tially occurs. This relation obtains when non-

    random environmentbehavior relationsarise by virtue of the organisms interaction with the environment. And, such interactionsgenerally occur because all responses are not equally likely in the presence of all stimuli.Rats are more apt to make forelimb move-ments approximating lever pressing (e.g.,climbing movements) in environments that contain protruding horizontal surfaces thanin environments that are devoid of such fea-tures. Behavior is directed toward objects andfeatures of objects, not thin air. When an ex-ternal environment includes stimuli that make certain behavior more probable, that environment was said to provide means-end-readinesses by Tolman (1932) and afford-ances by Gibson (1979).

    In addition to stimuli provided by the en- vironment, the organisms own prior behav-

    ior produces stimuli that become available toguide further responding. As an example of behaviorally generated stimuli, on ratioschedules a response is more apt to be rein-forced following sensory feedback from aburst of prior responses than following feed-back from a single prior response (Morse,1966; D. Williams, 1968). Ferster and Skin-ners seminal work, Schedules of Reinforcement (1957), is replete with proposals for stimulithat could function as discriminative stimuliin nondiscrimination procedures (see alsoBlough, 1963; Hinson & Staddon, 1983a,1983b).

    Interpreting Context in Simulations of Operant Conditioning

    In the simulation of acquisition, extinction,and reacquisition in a stable environment,the role of context could safely be ignored.However, for reasons noted earlier, control by elements of the context may occur, and that control can be simulated by selection net- works, the type of adaptive neural networkproposed in LCB . Selection networks consist of groups of input units, of interior units sim-ulating neurons in sensory association cortex whose connection strengths are modied by

    hippocampal efferents, of interior units sim-ulating neurons in motor association cortex whose connection strengths are modied by ventral-tegmental efferents, and of output units.

    Figure 2 provides an example of the archi-tecture of a simple selection network (for de-

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    13/19

    205S-R ISSUE

    Fig. 2. A minimal architecture of a selection network for simulating operant conditioning. Environmental eventsstimulate primary sensory input units (S1, S2, and S3) that give rise to connections that activate units in sensory association areas and, ultimately, units in motor association and primary motor areas. One primary motor output unit simulates the operant response (R). When the R unit is activated, the responsereinforcer contingency imple-mented by the simulation stimulates the S R input unit, simulating the reinforcing stimulus. Stimulating the S R unit activates the subcortical dopaminergic system of the ventral tegmental area (VTA) and the CR/UR output unit simulating the reinforcer-elicited response (i.e., the unconditioned response; UR). Subsequent to conditioning, en- vironmental events acting on the input units permit activation of the R and CR/UR units simulating the operant and conditioned response (CR), respectively. The VTA system modies connection weights to units in motor asso-ciation and primary motor areas and modulates the output of the hippocampal system. The output of the hippocam-

    pal system modies connection weights to units in sensory association areas. Connection weights are changed as afunction of moment-to-moment changes in (a) the coactivity of pre- and postsynaptic units and (b) the discrepanciesin diffusely projecting systems from the hippocampus (d1) and the VTA (d2). The arrowheads point toward thosesynapses that are affected by activity in the diffusely projecting systems. Finer lines indicate pathways whoseconnection weights are modied by the diffusely projecting systems. Heavier lines indicate pathways that are functional from theoutset of the simulation due to natural selection. (For additional information, see Donahoe et al., 1993; Donahoe &Dorsel, in press; Donahoe & Palmer, 1994.)

    tails, see Donahoe et al., 1993; LCB , pp. 237239). A stable context may be simulated usinga network with three input units (S1, S2, andS3). In the rst simulation, S1 was continu-ously activated with a strength of .75, simu-lating a salient feature of the environment

    (e.g., the wavelength on a key for a pigeon).S2 and S3 were continuously activated withstrengths of .50, simulating less salient fea-tures of the environment (e.g., the maskingnoise in the chamber, stimuli from the cham-ber wall adjacent to the key, etc.). (No simu-lation can fully capture the complexity and

    richness of even the relatively impoverishedenvironment of a test chamber and the rela-tively simple contingencies programmedtherein; Donahoe, in press-a.) Whenever theoutput unit simulating the operant becameactivated, a reinforcing stimulus was present-

    ed and all connections between recently co-active units were slightly strengthened. Aftertraining in which the full context set the oc-casion for the operant, probe tests were con-ducted in which each of the three input unitsmaking up the context was activated separate-ly and in various combinations. (Note, again,

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    14/19

    206 JOHN W. DONAHOE et al.

    Fig. 3. Simulation results showing the mean activa-tion levels of the operant output unit (R) after condi-tioning in a stable context consisting of three stimuli (S1,S2, and S3). In the upper panel, S1 was more salient thanthe other stimuli and activated the S1 input unit at a levelof .75 rather than .50 for the S2 and S3 units. In thelower panel, S1 was only slightly more salient than theother stimuli and activated the S1 input unit at a level of .60. The height of each bar represents the mean activa-tion of R by the various stimuli and combinations of stim-uli making up the context, including the full context of S1, S2, and S3 used in training (TRAIN).

    that simulation permits an assessment of con-ditions that cannot be completely realized ex-

    perimentally.) As shown in the upper panel of Figure 3,by the end of 100 simulated reinforcers fol-lowing nonzero activation of the operant unit, the output unit was strongly and reliably activated by the full context in which traininghad taken place (see leftmost bar). However,

    when even the most salient stimulus, S1, waspresented alone and out of context, the op-erant unit was activated only at a level slightly above .25. As noted in LCB , the environ-mentbehavior relation selected by the rein-forcer depends on the context in which theguiding stimulus appears (p. 139). And, astimulus that has been sensed and discrimi-nated may fail to guide behavior when it oc-curs outside the context in which the discrim-ination was acquired (p. 154). The lesssalient components of the context, S2 and S3,activated the operant unit hardly at all, whether they occurred by themselves or incombination. It was only when S1 was pre-sented in the partial context of either S2 orS3 that the operant unit was strongly activat-ed, although still not as strongly as in the fullcontext.

    The lower panel of Figure 3 shows that theeffect of context may be even more subtly ex-pressed when no aspect of the context is es-pecially salient. In this simulation, the S1component of the context was activated at alevel of .60 (instead of .75 as in the rst sim-ulation), and the S2 and S3 components wereactivated at a level of .50 as before. Now, when probe tests were simulated, the operant output unit was appreciably activated only by the full context and not by the components,either singly or in combination.

    As simulated by selection networks, the en- vironmental guidance of behavior, whether

    by a specied discriminative stimulus or by components of a variable context, is de-scribed in LCB as follows:

    Since there are generally a number of possiblepaths between the relevant input and output units, and since the active pathways mediatingthe selected input-output relation are likely to vary over time, the selected pathways includea number of alternative paths between the in-put and output units. Within the networkand that portion of the nervous system thenetwork is intended to simulatean input unit evokes activity in a class of pathways be-tween the input and output units. At the endof selection, the discriminative [or contextual]stimulus that activates the input units does not so much elicit the response as permit the re-sponse to be mediated by one or more of theselected pathways in the network. The . . .stimulus does not elicit the response; it per-mits the response to be emitted by the organ-ism. ( LCB , p. 148)

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    15/19

    207S-R ISSUE

    On the level of the nervous system, this is thecounterpart of Skinners distinction betweenelicited responses (respondents) and emittedresponses (operants); Skinner, 1937. ( LCB , p.151)

    Because, in general, behavior is not the result of the environment activating an invariant and rigidly circumscribed set of pathways,LCB prefers to speak of behavior as beingguided rather than controlled by the en- vironment. (As an aside, the phrase environ-mental guidance of behavior has also beenfound to have certain tactical advantages overstimulus control of behavior when seekinga fair hearing for behavior-analytic interpre-tations of human behavior.)

    The foregoing simulations illustrate thecontext dependence of the conditioning pro-cess when an operant is acquired in the stable

    environment of a nondiscrimination proce-dure. Our previous simulation research hasdemonstrated that an operant may bebrought under more precise stimulus con-trol: When a discrimination procedure wassimulated, the controlling stimuli were re-stricted to those that most reliably precededthe reinforced response (cf. Donahoe et al.,1993; LCB , p. 78). Thus, the same learningalgorithm that modies the strengths of con-nections in the same selection-network archi-tecture can simulate important conditioningphenomena as its cumulative effect with ei-ther a nondiscrimination or a discrimination

    procedure.Interpreting the Requirements for Operant Conditioning

    Simulation techniques can be applied tothe problem of identifying the necessary andsufcient conditions for learning in selectionnetworks. What are the contributions of thestimulus, the two-term responsereinforcercontingency, and the three-term stimulusre-sponsereinforcer contingency to operant conditioning? And, what role, if any, is playedby intranetwork variables that affect thespontaneous activity of units?

    Consider the question: What is the baselineactivation level of the operant unit (i.e., itsoperant level) when stimuli are applied to in-put units but without consequences for activ-ity induced in any other units in the network?In living organisms, this condition is imper-fectly realized because stimulus presentations

    by themselves have effects (e.g., habituation,sensitization, or latent inhibition) even whenresponding has no programmed conse-quences. However, in a simulation the input units can be stimulated when the algorithmsthat modify connection weights are disabled.In the present case, when the S1, S2, and S3input units were stimulated as in the rst sim-ulation of context conditioning but with nochange in connection weights, the mean ac-tivation of the operant output unit during200 trials was only .09. Thus, stimuli did not evoke activity in the operant unit to any ap-preciable degree; that is, responding was not elicited.

    Turn now to the question: Does condition-ing occur if activity of the operant unit is fol-lowed by a putative reinforcing stimulus whenthere is no environmental context (not mere-

    ly no measured or experimenter-manipulatedcontext)? To answer this question, a simula-tion was conducted under circumstances that were otherwise identical to the rst simula-tion except that the input units of the net- work were not activated. Any connectionstrengths that were modied were betweenunits that were activated as the result of spon-taneous coactivity between interior and op-erant units. Under such circumstances, acti- vation of the operant unit is emitted in thepurest sense; that is, its activation is solely theproduct of endogenous intranetwork vari-ables. Simulation indicated that even after as

    many as 1,000 operantreinforcer pairings us-ing identical values for all other parameters,conditioning did not occur. Thus, in the ab-sence of an environment, a two-term re-sponsereinforcer contingency was insuf-cient to produce conditioning in a selectionnetwork.

    The ineffectiveness of a two-term contin-gency between an activated output unit andthe occurrence of a putative reinforcer is aconsequence of our biologically based learn-ing algorithm (Donahoe et al., 1993, p. 40,Equation 5). The learning algorithm simu-lates modication of synaptic efcacies be-

    tween neurons, and is informed by experi-mental analyses of the conditions that produce long-term potentiation (LTP). Ex-perimental analyses of LTP indicate that syn-aptic efcacies increase when a neuromodu-lator (that occurs as a resul t of thereinforcing stimulus) is introduced into syn-

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    16/19

    208 JOHN W. DONAHOE et al.

    Fig. 4. Simulation results showing changes in the ac-tivation level of the R unit during conditioning for dif-ferent levels of spontaneous activity of units in the se-lection network. The level of spontaneous activity was varied by manipulating the standard deviation ( ) of thelogistic function, which determined the activation of aunit as a function of excitation from inputs to that unit.(See text for additional information.)

    apses between coactive pre- and postsynapticneurons (Frey, in press; Frey et al., 1993; seealso Beninger, 1983; Hoebel, 1988; Wise,1989). Under the conditions of the simula-tion, the presynaptic units and the output unit were very unlikely to be coactive spon-taneously. Without stimuli acting on input

    units to increase the likelihood of coactiveunits, the simulated reinforcer was ineffec-tive.

    Is, then, a three-term contingency suf-cient to simulate conditioning in a selectionnetwork? The curve in Figure 4 designatedby .1 shows the acquisition function forthe rst context-conditioning example. Aftersome 75 reinforcers, the operant output unit became increasingly strongly activated. Theparameter is the standard deviation of thelogistic function (see Donahoe et al., 1993,Equation 4), a nonlinear function relatingthe activation of a postsynaptic unit to the net

    excitation from its presynaptic inputs. Thisparameter determines the level of spontane-ous activity of a unit. (Neurons in the centralnervous system typically have baseline fre-quencies of ring that are substantially abovezero due to local intracellular and extracel-lular events.)

    As shown by the other acquisition functionsin Figure 4, reductions in the level of spon-taneous activity markedly retarded the simu-lated acquisition of operant conditioning. With .09, acquisition did not begin untilafter 125 reinforcers. Most strikingly, when was .08 or less, acquisition failed to occur al-together, even after as many as 200 simulatedthree-term contingencies. (The level of spon-taneous activation of individual units was ap-proximately .001 with .08.) Thus, in theabsence of spontaneous unit activity, even athree-term contingency was insufcient toproduce conditioning. From this perspective,the spontaneous activity of neurons is not animpediment to the efcient functioning of the nervous system or to its scientic inter-pretation by means of neural networks, but isan essential requirement for its operationand understanding.

    In conclusion, the effects of a three-termcontingency, together with spontaneous unit activity, are necessary and sufcient for thesimulation of operant conditioning in selec-tion networks. The interpretation of the se-lection process by neural networks leads to adeeper understanding of what it means to de-scribe operants as emitted. In the moment-to-moment account provided by neural net- works, the statements that what is selected isalways an environmentbehavior relation,never a response alone ( LCB , p. 68) andthat operant behavior occurs in a stimuluscontext, but there is often no identiablestimulus change that precedes each occur-rence of the response (Shull, 1995, p. 354)are not inconsistent. To the contrary, thestatements are complementary: An environ-ment is necessary for reinforcers to select be-havior, but without spontaneous intranetworkactivity environmentbehaviorreinforcer se-quences are insufcient. In a moment-to-mo-ment account, as favored by Skinner and im-plemented by selection networks, environmentbehavior relations are neither purely emittednor purely dependent on particular environ-ment stimuli. Within the range of environ-mentbehavior relations that are convention-

    ally designated as operant, relations aresimultaneously guided by the environment and emitted by the organism.

    REFERENCES Anger, D. (1956). The dependence of interresponse

    times upon the relative reinforcement of different in-

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    17/19

    209S-R ISSUE

    terresponse times. Journal of Experimental Psychology,52, 145161.

    Ayres, J. J. B., Benedict, J. O., & Witcher, E. S. (1975).Systematic manipulation of individual events in a truly random control with rats. Journal of Comparative and Physiological Psychology, 88, 97103.

    Baum, W. M. (1973). The correlation-based law of effect. Journal of the Experimental Analysis of Behavior, 20, 137154.

    Benedict, J. O. (1975). Response-shock delay as a rein-forcer in avoidance behavior. Journal of the Experimental Analysis of Behavior, 24, 323332.

    Benedict, J. O., & Ayres, J. J. B. (1972). Factors affectingconditioning in the truly random control procedurein the rat. Journal of Comparative and Physiological Psy- chology, 78, 323330.

    Beninger, R. J. (1983). The role of dopamine activity inlocomotor activity and learning. Brain Research Re- views, 6, 173196.

    Blough, D. S. (1963). Interresponse time as a functionof a continuous variable: A new method and somedata. Journal of the Experimental Analysis of Behavior, 6,237246.

    Bolles, R. C., & Popp, R. J., Jr. (1964). Parameters af-fecting the acquisition of Sidman avoidance. Journal of the Experimental Analysis of Behavior, 7, 315321.

    Buonomano, D. V., & Merzenich, M. M. (1995). Tem-poral information transformed into a spatial code by a neural network with realistic properties. Science, 267,10261028.

    Catania, A. C., & Keller, K. J. (1981). Contingency, con-tiguity, correlation, and the concept of causality. In P.Harzem & M. D. Zeiler (Eds.), Predictability, correlation,and contiguity (pp. 125167). New York: Wiley.

    Coleman, S. R. (1981). Historical context and systematicfunctions of the concept of the operant. Behaviorism,9, 207226.

    Coleman, S. (1984). Background and change in B. F.Skinners metatheory from 1930 to 1938. Journal of Mind and Behavior, 5, 471500.

    Dinsmoor, J. A. (1985). The role of observing and atten-tion in establishing stimulus control. Journal of the Ex- perimental Analysis of Behavior, 43, 365381.

    Dinsmoor, J. A. (1995). Stimulus control: Part I. The Be- havior Analyst, 18, 5168.

    Donahoe, J. W. (1993). The unconventional wisdom of B. F. Skinner: The analysis-interpretation distinction. Journal of the Experimental Analysis of Behavior, 60, 453456.

    Donahoe, J. W. (in press-a). The necessity of neural net- works. In J. W. Donahoe & V. P. Dorsel (Eds.), Neural- network models of cognition: Biobehavioral foundations . Amsterdam: Elsevier.

    Donahoe, J. W. (in press-b). Positive reinforcement: Theselection of behavior. In W. ODonohue (Ed.), Learn- ing and behavior therapy . Boston: Allyn & Bacon.

    Donahoe, J. W., Burgos, J. E., & Palmer, D. C. (1993).Selectionist approach to reinforcement. Journal of the Experimental Analysis of Behavior, 60, 1740.

    Donahoe, J. W., Crowley, M. A., Millard, W. J., & Stickney,K. A. (1982). A unied principle of reinforcement.In M. L. Commons, R. J. Herrnstein, & H. Rachlin(Eds.), Quantitative analyses of behavior (Vol. 2, pp. 493521). Cambridge, MA: Ballinger.

    Donahoe, J. W., & Dorsel, V. P. (Eds.). (in press). Neural-

    network models of cognition: Biobehavioral foundations . Amsterdam: Elsevier.

    Donahoe, J. W., & Palmer, D. C. (1989). The interpre-tation of complex human behavior: Some reactions toParallel Distributed Processing. Journal of the Experimental Analysis of Behavior, 51, 399416.

    Donahoe, J. W., & Palmer, D. C. (1994). Learning and complex behavior. Boston: Allyn & Bacon.

    Ferster, C. B., & Skinner, B. F. (1957). Schedules of rein- forcement . New York: Appleton-Century-Crofts.

    Frey, U. (in press). Cellular mechanisms of long-termpotentiation: Late maintenance. In J. W. Donahoe & V. P. Dorsel (Eds.), Neural-network models of cognition: Biobehavioral foundations . Amsterdam: Elsevier.

    Frey, U., Huang, Y.-Y., & Kandel, E. R. (1993). Effects of cAMP simulate a late stage of LTP in hippocampusCA1 neurons. Science, 260, 16611664.

    Galbicka, G. (1992). The dynamics of behavior. Journal of the Experimental Analysis of Behavior, 57, 243248.

    Gibson, J. J. (1979). The ecological approach to visual per- ception . Boston: Houghton-Mifin.

    Gormezano, I., & Kehoe, E. J. (1981). Classical condi-tioning and the law of contiguity. In P. Harzem & M.D. Zeiler (Eds.), Predictability, correlation, and contiguity (pp. 145). New York: Wiley.

    Guthrie, E. R. (1933). Association as a function of timeinterval. Psychological Review, 40, 355367.

    Hamilton, W. (1964). The genetical theory of social be-havior, I. II. Journal of Theoretical Biology, 7, 152.

    Heinemann, E. G., & Rudolph, R. L. (1963). The effect of discrimination training on the gradient of stimulusgeneralization. American Journal of Psychology, 76, 653656.

    Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243266.

    Herrnstein, R. J. (1982). Melioration as behavioral dy-namism. In M. L. Commons, R. J. Herrnstein, & H.Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2.Matching and maximizing accounts (pp. 433458). Cam-bridge, MA: Ballinger.

    Herrnstein, R. J., & Vaughan, W., Jr. (1980). Meliorationand behavioral allocation. In J. E. R. Staddon (Ed.),Limits to action: The allocation of individual behavior (pp.143176). New York: Academic Press.

    Heyman, G. N. (1979). A Markov model description of changeover probabilities on concurrent variable-inter- val schedules. Journal of the Experimental Analysis of Be- havior, 31, 4151.

    Hilgard, E. R., & Marquis, D. G. (1940). Conditioning and learning . New York: Appleton-Century-Crofts.

    Hineline, P. N. (1970). Negative reinforcement without shock reduction. Journal of the Experimental Analysis of Behavior, 14, 259268.

    Hineline, P. N. (1981). The several roles of stimuli innegative reinforcement. In P. Harzem & M. D. Zeiler(Eds.), Predictability, correlation, and contiguity (pp. 203246). New York: Wiley.

    Hineline, P. N. (1986). Re-tuning the operant-respon-dent distinction. In T. Thompson & M. D. Zeiler(Eds.), Analysis and integration of behavioral units (pp.5579). Hillsdale, NJ: Erlbaum.

    Hinson, J. M., & Staddon, J. E. R. (1983a). Hill-climbingby pigeons. Journal of the Experimental Analysis of Behav- ior, 39, 2547.

    Hinson, J. M., & Staddon, J. E. R. (1983b). Matching,

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    18/19

    210 JOHN W. DONAHOE et al.

    maximizing, and hillclimbing. Journal of the Experimen- tal Analysis of Behavior, 40, 321331.

    Hoebel, B. G. (1988). Neuroscience and motivation:Pathways and peptides that dene motivational sys-tems. In R. A. Atkinson (Ed.), Stevens handbook of ex- perimental psychology (Vol. 1, pp. 547625). New York: Wiley.

    Holland, P. C. (1977). Conditioned stimulus as a deter-minant of the form of the Pavlovian conditioned re-sponse. Journal of Experimental Psychology: Animal Behav- ior Processes, 3, 77104.

    Hull, C. L. (1934). The concept of habit-family hierarchy and maze learning. Psychological Review, 41, 3354.

    Hull, C. L. (1937). Mind, mechanism, and adaptive be-havior. Psychological Review, 44, 132.

    Jenkins, H. M., & Sainesbury, R. S. (1969). The devel-opment of stimulus control through differential re-inforcement. In N. J. Mackintosh & W. K. Honig(Eds.), Fundamental issues in associative learning (pp.123161). Halifax, Nova Scotia: Dalhousie University Press.

    Kehoe, E. J. (1988). A layered network model of asso-ciative learning: Learning to learn and conguration.Psychological Review, 95, 411433.

    Kehoe, E. J. (1989). Connectionist models of condition-ing: A tutorial. Journal of the Experimental Analysis of Behavior, 52, 427440.

    Keller, R. J., Ayres, J. J. B., & Mahoney, W. J. (1977). Brief versus extended exposure to truly random controlprocedures. Journal of Experimental Psychology: Animal Behavior Processes, 3, 5365.

    Lattal, K. A., & Gleeson, S. (1990). Response acquisition with delayed reinforcement. Journal of Experimental Psy- chology: Animal Behavior Processes, 16, 2739.

    Lieberman, P. A. (1993). Learning: Behavior and cognition.Pacic Grove, CA: Brooks/Cole.

    McClelland, J. L., Rumelhart, D. E., & The PDP ResearchGroup. (Eds.). (1986). Parallel distributed processing: Explorations in microstructure of cognition (Vol. 2). Cam-bridge, MA: MIT Press.

    Meazzini, P., & Ricci, C. (1986). Molar vs. molecularunits of analysis. In T. Thompson & M. D. Zeiler(Eds.), Analysis and integration of behavioral units (pp.1943). Hillsdale, NJ: Erlbaum.

    Minsky, M. L., & Papert, S. A. (1969). Perceptrons . Cam-bridge, MA: MIT Press.

    Moore, J. (1984). Choice and transformed interrein-forcement intervals. Journal of the Experimental Analysis of Behavior, 42, 321335.

    Morse, W. H. (1966). Intermittent reinforcement. In W.K. Honig (Ed.), Operant behavior: Areas of research and application (pp. 52108). New York: Appleton-Centu-ry-Crofts.

    Morse, W. H., & Skinner, B. F. (1957). A second type of superstition in the pigeon. American Journal of Psychol- ogy, 70, 308311.

    Neuringer, A. J. (1967). Choice and rate of responding in the pigeon. Unpublished doctoral dissertation, HarvardUniversity.

    Nevin, J. A. (1969). Interval reinforcement of choice be-havior in discrete trials. Journal of the ExperimentalAnal- ysis of Behavior, 12, 875885.

    Nevin, J. A. (1984). Quantitative analysis. Journal of the Experimental Analysis of Behavior, 42, 421434.

    Osgood, C. E. (1953). Method and theory in experimental psychology . New York: Oxford University Press.

    Palmer, D. C., & Donahoe, J. W. (1992). Essentialismand selectionism in cognitive science and behavioranalysis. American Psychologist, 47, 13441358.

    Pear, J. J. (1985). Spatiotemporal patterns of behaviorproduced by variable-interval schedules of reinforce-ment. Journal of the Experimental Analysis of Behavior, 44,217231.

    Platt, J. R. (1979). Interresponse-time shaping by vari-able-interval-like interresponse-time reinforcement contingencies. Journal of the Experimental Analysis of Be- havior, 31, 314.

    Quinsey, V. L. (1971). Conditioned suppression with noCS-US contingency in the rat. Canadian Journal of Psy- chology, 25, 6982.

    Rescorla, R. A. (1967). Pavlovian conditioning and itsproper control procedures. Psychological Review, 74,7180.

    Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectivenessof reinforcement and nonreinforcement. In A. H.Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 6499). New York: Ap-pleton-Century-Crofts.

    Rosenblatt, F. (1962). Principles of neurodynamics . Wash-ington, DC: Spartan.

    Rumelhart, D. E., McClelland, J. L., & The PDP ResearchGroup. (Eds.) (1986). Parallel distributed processing: Ex- plorations in the microstructure of cognition (Vol. 1). Cam-bridge, MA: MIT Press.

    Schoenfeld, W. N., & Farmer, J. (1970). Reinforcement schedules and the behavior stream. In W. N.Schoenfeld (Ed.), The theory of reinforcement schedules (pp. 215245). New York: Appleton-Century-Crofts.

    Shimp, C. P. (1966). Probabilistically reinforced choicebehavior in pigeons. Journal of the Experimental Analysis of Behavior, 9, 443455.

    Shimp, C. P. (1969). Optimal behavior in free-operant experiments. Psychological Review, 76, 97112.

    Shimp, C. P. (1974). Time allocation and response rate. Journal of the Experimental Analysis of Behavior, 21, 491499.

    Shull, R. L. (1995). Interpreting cognitive phenomena:Review of Donahoe and Palmers Learning and Com- plex Behavior. Journal of the Experimental Analysis of Be- havior, 63, 347358.

    Sidman, M. (1986). Functional analysis of emergent ver-bal classes. In T. Thompson & M. D. Zeiler (Eds.),Analysis and integration of behavioral units (pp. 213245). Hillsdale, NJ: Erlbaum.

    Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J.(1978). The structure of choice. Journal of Experimen- tal Psychology: Animal Behavior Processes, 4, 368398.

    Silberberg, A., & Ziriax, J. M. (1982). The interchange-over time as a molecular dependent variable in con-current schedules. In M. L. Commons, R. J. Herrn-stein, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2. Matching and maximizing accounts of behavior (pp. 111130). Cambridge, MA: Ballinger.

    Skinner, B. F. (1931). The concept of the reex in thestudy of behavior. Journal of General Psychology, 5, 427458.

    Skinner, B. F. (1937). Two types of conditioned reex: A reply to Konorski and Miller. Journal of General Psy- chology, 16, 272279.

    Skinner, B. F. (1938). The behavior of organisms . New York: Appleton-Century-Crofts.

  • 8/9/2019 Donahoe. Palmer, & Burgos (1997)

    19/19

    211S-R ISSUE

    Skinner, B. F. (1948). Superstition in the pigeon. Jour- nal of Experimental Psychology, 38, 168172.

    Skinner, B. F. (1953). Science and human behavior . New York: Macmillan.

    Skinner, B. F. (1976). Farewell, my lovely! Journal of the Experimental Analysis of Behavior, 25, 218.

    Skinner, B. F. (1983). A matter of consequences. New York:Knopf.

    Sober, E. (1984). The nature of selection . Cambridge, MA:MIT Press.

    Sober, E. (1993). Philosophy of biology . Boulder, CO: West- view Press.

    Staddon, J. E. R. (1993). The conventional wisdom of behavior analysis. Journal of the Experimental Analysis of Behavior, 60, 439447.

    Staddon, J. E. R., & Hinson, J. M. (1983). Optimization: A result or a mechanism? Science, 221, 976977.

    Staddon, J. E. R., & Simmelhag, V. L. (1971). The su-perstition experiment: A reexamination of its impli-cations for the principles of adaptive behavior. Psycho- logical Review, 78, 343.

    Stein, L., & Belluzzi, J. D. (1988). Operant conditioningof individual neurons. In M. L. Commons, R. M.Church, J. R. Stellar, & A. R. Wagner (Eds.), Quanti- tative analyses of behavior (Vol. 7, pp. 249264). Hills-dale, NJ: Erlbaum.

    Stein, L., & Belluzzi, J. D. (1989). Cellular investigationsof behavioral reinforcement. Neuroscience and Biobehav- ioral Reviews, 13, 6980.

    Stein, L., Xue, B. G., & Belluzzi, J. D. (1993). A cellular

    analogue of operant conditioning. Journal of the Exper- imental Analysis of Behavior, 60, 4153.

    Stein, L., Xue, B. G., & Belluzzi, J. D. (1994). In vitroreinforcement of hippocampal bursting: A search forSkinners atom of behavior. Journal of the Experimental Analysis of Behavior, 61, 155168.

    Timberlake, W., & Lucas, G. A. (1985). The basis of su-perstitious behavior: Chance contingency, stimulussubstitution, or appetitive behavior? Journal of the Ex- perimental Analysis of Behavior, 44, 279299.

    Tolman, E. C. (1932). Purposive behavior in animals and men . New York: Appleton-Century-Crofts.

    Watson, J. B. (1924). Behaviorism . New York: Norton. Williams, B. A. (1985). Choice behavior in a discrete-

    trial concurrent VI-VR: A test of maximizing theoriesof matching. Learning and Motivation, 16, 423443.

    Williams, B. A. (1986). Identifying behaviorisms proto-type: A review of Behaviorism: A Conceptual Reconstruc- tion by G. E. Zuriff. The Behavior Analyst, 9, 117122.

    Williams, B. A. (1990). Enduring problems for molecu-lar accounts of operant behavior. Journal of Experimen- tal Psychology: Animal Behavior Processes, 16, 213216.

    Williams, D. R. (1968). The structure of response rate. Journal of the Experimental Analysis of Behavior, 11, 251

    258. Wise, R. A. (1989). The brain and reward. In J. M. Lieb-man & S. J. Cooper (Eds.), The neuropharmacological basis of reward (pp. 377424). New York: Oxford Uni- versity Press.

    Zuriff, G. E. (1985). Behaviorism: A conceptual reconstruc- tion . New York: Columbia University Press.