animal learning and cognition. introduction

Upload: mauricio-mena

Post on 02-Mar-2016

55 views

Category:

Documents


2 download

TRANSCRIPT

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Animal Learning &Cognition

    Pearm-FM.qxp 12/7/07 9:03 PM Page i

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Pearm-FM.qxp 12/7/07 9:03 PM Page ii

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Animal Learning &Cognition,An IntroductionThird Edition

    John M. Pearce

    Cardiff University

    Pearm-FM.qxp 12/7/07 9:03 PM Page iii

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Published in 2008 by Psychology Press27 Church Road, Hove, East Sussex, BN3 2FA

    Simultaneously published in the USA and Canada by Psychology Press270 Madison Ave, New York, NY 10016

    www.psypress.com

    Psychology Press is an imprint of the Taylor & Francis Group,an informa business

    2008 Psychology Press

    All rights reserved. No part of this book may be reprinted or reproduced or utilized inany form or by any electronic, mechanical, or other means, now known or hereafterinvented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers.

    The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legalresponsibility or liability for any errors or omissions that may be made.

    British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication DataPearce, John M.

    Animal learning and cognition: an introduction / John M. Pearce.p. cm.

    Includes bibliographical references and index.ISBN 9781841696553ISBN 97818416965601. Animal intelligence. I. Title.

    QL785.P32 2008591.5'13dc22 2007034019

    ISBN: 9781841696553 (hbk)ISBN: 9781841696560 (pbk)

    Typeset by Newgen Imaging Systems (P) Ltd, Chennai, IndiaPrinted and bound in Slovenia

    Pearm-FM.qxp 12/7/07 9:03 PM Page iv

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    For Victoria

    Pearm-FM.qxp 12/7/07 9:03 PM Page v

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Pearm-FM.qxp 12/7/07 9:03 PM Page vi

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    ContentsPreface ix

    1 The study of animal intelligence 2The distribution of intelligence 4Defining animal intelligence 12Why study animal intelligence? 16Methods for studying animal intelligence 20Historical background 22

    2 Associative learning 34Conditioning techniques 36The nature of associative learning 42Stimulusstimulus learning 49The nature of US representations 52The conditioned response 55Concluding comment: the reflexive nature of the conditioned response 60

    3 The conditions for learning: Surprise and attention 62Part 1: Surprise and conditioning 64Conditioning with a single CS 64Conditioning with a compound CS 68Evaluation of the RescorlaWagner model 72Part 2: Attention and conditioning 74Wagners theory 76Stimulus significance 80The PearceHall theory 86Concluding comments 91

    4 Instrumental conditioning 92The nature of instrumental learning 93The conditions of learning 97

    The performance of instrumental behavior 106The Law of Effect and problem solving 111

    5 Extinction 122Extinction as generalization decrement 123The conditions for extinction 125Associative changes during extinction 134Are trials important for Pavlovianextinction? 142

    6 Discrimination learning 148Theories of discrimination learning 149Connectionist models of discrimination learning 161Metacognition and discrimination learning 166

    7 Category formation 170Examples of categorization 171Theories of categorization 173Abstract categories 179Relationships as categories 180The representation of knowledge 188

    8 Short-term retention 190Methods of study 191Forgetting 199Theoretical interpretation 202Serial position effects 206Metamemory 207

    9 Long-term retention 212Capacity 214Durability 215

    Pearm-FM.qxp 12/7/07 9:03 PM Page vii

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    viii Contents

    Theoretical interpretation 218Episodic memory 225

    10 Time, number, and serial order 232Time 233Number 243Serial order 253Transitive inference 259Concluding comments 262

    11 Navigation 264Part 1: Short-distance travel 265Methods of navigation 265Part 2: Long-distance travel 283Navigational cues 284Homing 286Migration 289Concluding comments 293

    12 Social learning 296Diet selection and foraging 298Choosing a mate 301Fear of predators 301Copying behavior: mimicry 302

    Copying behavior: imitation 304Theory of mind 312Self-recognition 319Concluding comments 324

    13 Animal communication andlanguage 326Animal communication 327Communication and language 336Can an ape create a sentence? 339Language training with other species 350The requirements for learning a language 356

    14 The distribution of intelligence 360Intelligence and brain size 361The null hypothesis 364Intelligence and evolution 369

    References 373Author index 403Subject index 411

    Pearm-FM.qxp 12/7/07 9:03 PM Page viii

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Preface

    In preparing the third edition of this book, my aim, as it was for the previous editions,has been to provide an overview of what has been learned by pursuing one particularapproach to the study of animal intelligence. It is my belief that the intelligence ofanimals is the product of a number of mental processes. I think the best way ofunderstanding these processes is by studying the behavior of animals in anexperimental setting. This book, therefore, presents what is known about animalintelligence by considering experimental findings from the laboratory and from morenaturalistic settings.

    I do not attach any great importance to the distinction between animal learningand animal cognition. Research in both areas has the common goal of elucidating themechanisms of animal intelligence and, very often, this research is conducted usingsimilar procedures. If there is any significance to the distinction, then it is that thefields of animal learning and animal cognition are concerned with different aspectsof intelligence. Chapters 2 to 6 are concerned predominantly with issues that fallunder the traditional heading of animal learning theory. My main concern in thesechapters is to show how it is possible with a few simple principles of associativelearning to explain a surprisingly wide range of experimental findings. Readersfamiliar with the previous edition will notice that apart from a new chapter devotedto extinction, there are relatively few changes to this part of the book. This lack ofchange does not mean that researchers are no longer actively investigating the basiclearning processes in animals. Rather, it means that the fundamental principles of

    Pearm-FM.qxp 12/7/07 9:03 PM Page ix

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    x Preface

    learning are now reasonably well established and that current research is directedtowards issues that are too advanced to be considered in an introductory text book.

    The second half of the book covers material that is generally treated under theheading of animal cognition. My overall aim in these chapters is to examine what hasbeen learned from studying animal behavior about such topics as memory, therepresentation of knowledge, navigation, social learning, communication, andlanguage. I also hope to show that the principles developed in the earlier chapters areof relevance to understanding research that is reviewed in the later chapters. It is inthis part of the book that the most changes have been made. Research on animalcognition during the last 10 years has headed in many new directions. I have tried topresent a clear summary of this research, as well as a balanced evaluation of itstheoretical implications.

    Those who wish to study the intelligence of animals face a daunting task. Notonly are there numerous different species to study, but there is also an array ofintellectual skills to be explored, each posing a unique set of challenging theoreticalproblems. As a result, many of the topics that I discuss are still in their infancy. Somereaders may therefore be disappointed to discover that we are still trying to answermany of the interesting questions that can be asked about the intelligence of animals.On the other hand, it is just this lack of knowledge that makes the study of animallearning and cognition so exciting. Many fascinating discoveries remain to be madeonce the appropriate experiments have been conducted.

    One of the rewards for writing a book is the opportunity it provides to thank themany friends and colleagues who have been so generous with the help they havegiven me. The way in which this book is organized and much of the material itcontains have been greatly influenced by numerous discussions with A. Dickinson,G. Hall, N. J. Mackintosh, and E. M. Macphail. Different chapters have benefitedgreatly from the critical comments on earlier versions by A. Aydin, N. Clayton,M. Haselgrove, C. Heyes, V. LoLordo, A. McGregor, E. Redhead, and P. Wilson. A special word of thanks is due to Dave Lieberman, whose thoughtful comments onan earlier draft of the present edition identified numerous errors and helped to clarifythe manner in which much of the material is presented. The present edition hasalso greatly benefited from the detailed comments on the two previous editions byN. J. Mackintosh.

    I should also like to express my gratitude to the staff at Psychology Press.Without the cajoling and encouragement of the Assistant Editor, Tara Stebnicky, it isunlikely that I would have embarked on this revision. I am particularly grateful to theProduction Editor, Veronica Lyons, who, with generous amounts of enthusiasm andimagination, has done a wonderful job in trying to transform a sows ear into a silkpurse. Thanks are also due to the colleagues who were kind enough to send mephotographs of their subjects while they were being tested. Finally, there is thepleasure of expressing gratitude to Victoria, my wife, who once again patientlytolerated the demands made on her while this edition was being prepared. In previouseditions I offered similar thanks to my children, but there is no need on this occasionnow that they have left home. Even so, Jess, Alex, and Tim would never forgive meif I neglected to mention their names.

    While preparing for this revision I read a little about Darwins visit to theGalapagos Islands. I was so intrigued by the influence they had on him that I feltcompelled to visit the islands myself. During the final stages of preparing thisedition, Veronica and Tania, somewhat reluctantly, allowed me a two-week break totravel to the Galapagos Islands. The holiday was one of the highlights of my life.

    Pearm-FM.qxp 12/7/07 9:03 PM Page x

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    Preface xi

    The sheer number of animals, and their absolute indifference to the presence ofhumans, was overwhelming. The picture on the previous page shows me tryingunsuccessfully to engage a giant tortoise in conversation. This, and many otherphotographs, were taken without any elaborate equipment and thus reveal how theanimals allowed me to approach as close as I wished in order to photograph them. I came away from the islands having discovered little that is new about theintelligence of animals, but with a deeper appreciation of how the environmentshapes not only their form, but also their behavior.

    John M. PearceOctober, 2007

    Pearm-FM.qxp 12/7/07 9:03 PM Page xi

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    CH

    APTE

    R 4

    C O N T E N T S

    The nature of instrumental learning 93

    The conditions of learning 97

    The performance of instrumental behavior 106

    The Law of Effect and problem solving 111

    Pearm-04.qxp 12/7/07 8:16 PM Page 92

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4InstrumentalconditioningBehavior is affected by its consequences. Responses that lead to reward are repeated,whereas those that lead to punishment are withheld. Instrumental conditioning refersto the method of using reward and punishment in order to modify an animalsbehavior. The first laboratory demonstration of instrumental conditioning wasprovided by Thorndike (1898) who, as we saw in Chapter l, trained cats to make aresponse in order to escape from a puzzle box and earn a small amount of fish. Sincethis pioneering work, there have been many thousands of successful demonstrationsof instrumental conditioning, employing a wide range of species, and a variety ofexperimental designs. Skinner, for example, taught two pigeons, by means ofinstrumental conditioning, to play ping-pong with each other.

    From the point of view of understanding the mechanisms of animal intelligence,three important issues are raised by a successful demonstration of instrumentalconditioning. We need to know what information an animal acquires as a result of itstraining. Pavlovian conditioning was shown to promote the growth ofstimulusstimulus associations, but what sort of associations develop when a responseis followed by a reward or punishment? Once the nature of the associations formedduring instrumental conditioning has been identified, we then need to specify theconditions that promote their growth. Surprise, for example, is important for successfulPavlovian conditioning, but what are the necessary ingredients to ensure the success ofinstrumental conditioning? Finally, we need to understand the factors that determinewhen, and how vigorously, an instrumental response will be performed.

    Before turning to a detailed discussion of these issues, we must be clear what ismeant by the term reinforcer. This term refers to the events that result in thestrengthening of an instrumental response. The events are classified as either positivereinforcers, when they consist of the delivery of a stimulus, or negative reinforcers,when it involves the removal of a stimulus.

    THE NATURE OF INSTRUMENTAL LEARN ING

    Historical backgroundThorndike (1898) was the first to propose that instrumental conditioning is based onlearning about responses. According to his Law of Effect, when a response isfollowed by a reinforcer, then a stimulusresponse (SR) connection is strengthened.In the case of a rat that must press a lever for food, the stimulus might be the leveritself and the response would be the action of pressing the lever. Each successfullever press would thus serve to strengthen a connection between the sight of the leverand the response of pressing it. As a result, whenever the rat came across the lever inthe future, it would be likely to press it and thus gain reward. This analysis ofinstrumental conditioning has formed the basis of a number of extremely influentialtheories of learning (e.g. Hull, 1943).

    K E Y T E R M

    ReinforcerAn event thatincreases theprobability of aresponse whenpresented after it. If the event is theoccurrence of astimulus, such asfood, it is referred to as a positivereinforcer; but if theevent is the removalof a stimulus, such asshock, it is referred to as a negativereinforcer.

    Pearm-04.qxp 12/7/07 8:16 PM Page 93

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    94 A N I M A L L E A R N I N G & C O G N I T I O N

    A feature of the Law of Effect that has proved unacceptable to the intuitions ofmany psychologists is that it fails to allow the animal to anticipate the goal for whichit is responding. The only knowledge that an SR connection permits an animal topossess is the knowledge that it must make a particular response in the presence of agiven stimulus. The delivery of food after the response will, according to the Law ofEffect, effectively come as a complete surprise to the animal. In addition to soundingimplausible, this proposal has for many years conflicted with a variety ofexperimental findings.

    One early finding is reported by Tinkelpaugh (1928), who required monkeys toselect one of two food wells to obtain reward. On some trials the reward was abanana, which was greatly preferred to the other reward, a lettuce leaf. Once theanimals had been trained they were occasionally presented with a lettuce leaf whenthey should have received a banana. The following quote, which is cited inMackintosh (1974), provides a clear indication that the monkey expected a moreattractive reward for making the correct response (Tinkelpaugh, 1928, p. 224):

    She extends her hand to seize the food. But her hand drops to the floor withouttouching it. She looks at the lettuce but (unless very hungry) does not touch it.She looks around the cup and behind the board. She stands up and looks underand around her. She picks the cup up and examines it thoroughly inside and out.She had on occasion turned toward the observers present in the room andshrieked at them in apparent anger.

    A rather different type of finding that showsanimals anticipate the rewards for which they areresponding can be found in experiments in which ratsran down an alley, or through a maze, for food. If arat is trained first with one reward which is thenchanged in attractiveness, there is a remarkably rapidchange in its performance on subsequent trials. Elliott(1928) found that the number of errors in a multiple-unit maze increased dramatically when the quality ofreward in the goal box was reduced. Indeed, theanimals were so dejected by this change that theymade more errors than a control group that had beentrained throughout with the less attractive reward(Figure 4.1). According to SR theory, the change inperformance by the experimental group should havetaken place more slowly, and should not have resulted

    in less accurate responding than that shown by the control group. As an alternativeexplanation, these findings imply that the animals had some expectancy of thereward they would receive in the goal that allowed them to detect when it was madeless attractive.

    Tolman (1932) argued that findings such as these indicate that rats formRunconditioned stimulus (US) associations as a result of instrumental conditioning.They are assumed to learn that a response will be followed by a particular outcome.There is no doubt that the results are consistent with this proposal, but they do notforce us to accept it. Several SR theorists have pointed out that the anticipation ofreward could have been based on conditioned stimulus (CS)US, rather than RUS

    FIGURE 4.1 The meannumber of errors madeby two groups of rats ina multiple-unit maze. Forthe first nine trials thereward for the controlgroup was moreattractive than for theexperimental group, butfor the remaining trialsboth groups received thesame reward (adaptedfrom Elliott, 1928).

    70

    60

    50

    40

    30

    20

    10

    00 2 4 6 8 10 12 14 16

    Trials

    Per

    cent e

    rrors

    Experimental

    Control

    Pearm-04.qxp 12/7/07 8:16 PM Page 94

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 95

    associations. In Elliotts (1928) experiment, for example, the animal consumed thereward in the goal box. It is possible that the stimuli created by this part of theapparatus served as a CS that became associated with food. After a number oftraining trials, therefore, the sight of the goal box would activate a representation ofthe reward and thereby permit the animal to detect when its value was changed. BothHull (1943) and Spence (1956) seized on this possibility and proposed that thestrength of instrumental responding is influenced by the Pavlovian properties of thecontext in which the response is performed.

    The debate between SR theorists and what might be called the expectancy(RUS) theorists continued until the 1970s (see for example, Bolles, 1972). In thelast 20 years or so, however, experiments have provided new insights into thenature of the associations that are formed during instrumental conditioning. Toanticipate the following discussion, these experiments show that both the SR andthe expectancy theorists were correct. The experiments also show that these theoristsunderestimated the complexity of the information that animals can acquire in evenquite simple instrumental conditioning tasks.

    Evidence for RUS associationsTo demonstrate support for an expectancy theory of instrumental conditioning,Colwill and Rescorla (1985) adopted a reinforcer devaluation design (see alsoAdams & Dickinson, 1981). A single group of rats was trained in the mannersummarized in Table 4.1. In the first (training) stage of the experiment subjects wereable to make one response (R1) to earn one reinforcer (US1) and another response(R2) to earn a different reinforcer (US2). The two responses were lever pressing orpulling a small chain that was suspended from the ceiling, and the two reinforcerswere food pellets or sucrose solution. After a number of sessions of this training, anaversion was formed to US1 by allowing subjects free access to it and then injectingthem with a mild poison (lithium chloride; LiCl). This treatment was so effective thatsubjects completely rejected US1 when it was subsequently presented to them. Forthe test trials subjects were again allowed to make either of the two responses, butthis time neither response led to the delivery of a reinforcer. The results from theexperiment are shown in Figure 4.2, which indicates that R2 was performed morevigorously than R1. The figure also shows a gradual decline in the strength of R2,which reflects the fact that neither response was followed by reward. This pattern ofresults can be most readily explained by assuming that during their training ratsformed RlUS1 and R2US2 associations. They would then be reluctant to performR1 in the test phase because of their knowledge that this response produced areinforcer that was no longer attractive.

    Training Devaluation Test

    R1 US1 US1 LiCl R1 versus R2R2 US2

    LiCl, lithium chloride; R, response; US, unconditioned stimulus.

    TABLE 4.1 Summary of the training given to a single group of rats in anexperiment by Colwill and Rescorla (1985)

    K E Y T E R M

    Reinforcer devaluationA technique in whichthe positive reinforcerfor an instrumentalresponse issubsequentlydevalued, normally by pairing itsconsumption withillness.

    Pearm-04.qxp 12/7/07 8:16 PM Page 95

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    96 A N I M A L L E A R N I N G & C O G N I T I O N

    Evidence for SR associationsThe evidence that instrumental conditioning results inthe development of SR associations is perhaps lessconvincing than that concerning the development ofRUS associations. A re-examination of Figure 4.2reveals that after the devaluation treatment thereremained a tendency to perform R1. This tendencywas sustained even though the response neverresulted in the delivery of a reinforcer and, moreimportantly, it was sustained even though thedevaluation training resulted in a complete rejectionof US1. The fact that an animal is willing to make aresponse, even though it will reject the reinforcer thatnormally follows the response, is just what would beexpected if the original training resulted in the growth

    of an SR connection. In other words, because an SR connection does not allow ananimal to anticipate the reward it will receive for its responses, once such aconnection has formed the animal will respond for the reward even if it is no longerattractive. Thus the results of the experiment by Colwill and Rescorla (1985) indicatethat during the course of their training rats acquired both RUS and SRassociations.

    Readers who are struck by the rather low rate at which Rl was performed mightconclude that the SR connection is normally of little importance in determiningresponding. Note, however, that for the test trials there was the opportunity ofperforming either R1 or R2. Even a slight preference for R2 would then have asuppressive effect on the performance of R1. On the basis of the present results,therefore, it is difficult to draw precise conclusions concerning the relativecontribution SR and RUS associations to instrumental responding.

    To complicate matters even further, it seems that the relative contribution of SRand RUS associations to instrumental behavior is influenced by the training given.Adams and Dickinson (1981) conducted a series of experiments in which rats had topress a lever for food. An aversion to the food was then conditioned using a techniquesimilar to that adopted by Colwill and Rescorla (1985). If a small amount ofinstrumental training had been given initially, then subjects showed a markedreluctance to press the lever in a final test session. But if extensive instrumental traininghad been given initially, there was little evidence of any effect at all of the devaluationtreatment. Adams and Dickinson (1981) were thus led to conclude that RUSassociations underlie the acquisition and early stages of instrumental training, but withextended practice this learning is transformed into an SR habit. There is some debateabout the reasons for this change in influence of the two associations, or whether italways takes place (see Dickinson & Balleine, 1994).

    Evidence for S(RUS) associationsAnimals can thus learn to perform a particular response in the presence of a givenstimulus (SR learning), they can also learn that a certain reinforcer will follow aresponse (RUS learning). The next question to ask is whether this information canbe integrated to provide the knowledge that in the presence of a certain stimulus acertain response will be followed by a certain outcome. Table 4.2 summarizes

    FIGURE 4.2 The meanrates at which a singlegroup of rats performedtwo responses, R1 andR2, that had previouslybeen associated withtwo different rewards.Before the test sessions,the reward for R1, butnot R2, had beendevalued. No rewardswere presented in thetest session (adaptedfrom Rescorla, 1991).

    8

    6

    4

    2

    00 1 2 3 4 5

    Blocks of four minutes

    Mea

    n re

    spon

    ses

    per m

    inut

    e

    R2

    R1

    Pearm-04.qxp 12/7/07 8:16 PM Page 96

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 97

    the design of an experiment by Rescorla (1991) that was conducted to test thispossibility.

    A group of rats first received discrimination training in which a light or a noise(S1 or S2) was presented for 30 seconds at a time. During each stimulus the rats weretrained to perform two responses (pulling a chain or pressing a lever), which eachresulted in a different reinforcer (food pellets or sucrose solution). The design ofconditioning experiments is rarely simple and, in this case, it was made moredifficult by reversing the responsereinforce relationships for the two stimuli. Thusin S1, R1 led to US1 and R2 led to US2; but in S2, R1 led to US2 and R2 led to US1.For the second stage of the experiment, the reinforcer devaluation technique wasused to condition an aversion to US2. Finally, test trials were conducted in extinctionin which subjects were provided with the opportunity of performing the tworesponses in the presence of each stimulus. The result from these test trials was quiteclear. There was a marked preference to perform R1, rather than R2, in the presenceof S1; but in the presence of S2 there was a preference to perform R2 rather than R1.These findings cannot be explained by assuming that the only associations acquiredduring the first stage were SR, otherwise the devaluation technique would havebeen ineffective. Nor can the results be explained by assuming that only RUSassociations developed, otherwise devaluation treatment should have weakened R1and R2 to the same extent in both stimuli. Instead, the results can be most readilyexplained by assuming that the subjects were sensitive to the fact that the devaluedreinforcer followed R2 in S1, and followed R1 in S2. Rescorla (1991) has argued thatthis conclusion indicates the development of a hierarchical associative structure thathe characterizes as S(RUS). Animals are first believed to acquire an RUSassociation, and this association in its entirety is then assumed to enter into a newassociation with S. Whether it is useful to propose that an association can itself enterinto an association remains to be seen. There are certainly problems with this type ofsuggestion (see, for example, Holland, 1992). In addition, as Dickinson (1994) pointsout, there are alternative ways of explaining the findings of Rescorla (1991). Despitethese words of caution, the experiment demonstrates clearly that animals are able toanticipate the reward they will receive for making a certain response in the presenceof a given stimulus.

    THE COND IT IONS OF LEARN INGThere is, therefore, abundant evidence to show that animals are capable of learningabout the consequences of their actions. We turn now to consider the conditions thatenable this learning to take place.

    Discrimination training Devaluation Test

    Sl: R1 US1 and R2 US2 S1: R1 R2US2 LiCl

    S2: R1 US2 and R2 US1 S2: R2 Rl

    LiCl, lithium chloride; R, response; S, stimulus; US, unconditioned stimulus.

    TABLE 4.2 Summary of the training given to a single group of rats in anexperiment by Rescorla (1991)

    Pearm-04.qxp 12/7/07 8:16 PM Page 97

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    98 A N I M A L L E A R N I N G & C O G N I T I O N

    ContiguityA fundamental principle of the early theories oflearning was that instrumental conditioning is mosteffective when the response is contiguous with or, inother words, followed immediately by the reinforcer.An early demonstration of this influence ofcontiguity on instrumental conditioning was made byLogan (1960), who trained rats to run down an alleyfor food. He found that the speed of running wassubstantially faster if the rats received food as soon asthey reached the goal box, as opposed to waiting inthe goal box before food was made available. Thisdisruptive effect of waiting was found with delaysfrom as little as 3 seconds. Moreover, the speed ofrunning down the alley was directly related to theduration of the delay in the goal box. Thiseffect, which is referred to as the gradient ofdelay, has been reported on numerous occasions(e.g. Dickinson, Watt, & Griffiths, 1992).

    It is apparent from Logans (1960) study thateven relatively short delays between a responseand a reinforcer disrupt instrumental conditioning.Once this finding has been established, it thenbecomes pertinent to consider by how much thereinforcer can be delayed before instrumentalconditioning is no longer possible. The preciseanswer to this question remains to be sought, but astudy by Lattal and Gleeson (1990) indicates that itmay be greater than 30 seconds. Rats wererequired to press a lever for food, which wasdelivered 30 seconds after the response. If anotherresponse was made before food was delivered thenthe timer was reset and the rat had to wait another30 seconds before receiving food. This scheduleensured that the delay between any response andfood was at least 30 seconds. Despite beingexposed to such a demanding method of training,

    each of the three rats in the experiment showed an increase in the rate of leverpressing as training progressed. The results from one rat are shown in Figure 4.3.The remarkable finding from this experiment is that rats with no prior experienceof lever pressing can increase the rate of performing this response when the onlyresponse-produced stimulus change occurs 30 seconds after a response hasbeen made.

    It should be emphasized that the rate of lever pressing by the three rats wasrelatively slow, and would have been considerably faster if food had beenpresented immediately after the response. Temporal contiguity is thus importantfor instrumental conditioning, but such conditioning is still effective, albeit to alesser extent, when there is a gap between the response and the deliveryof reward.

    FIGURE 4.3 The mean rate of pressing a lever by asingle rat when food was presented 30 seconds after aresponse (adapted from Lattal & Gleeson, 1990).

    8

    6

    4

    2

    00

    Mea

    n re

    spon

    ses

    per m

    inut

    e

    2 4 6 8 10Session

    Temporal contiguity is an important factor in theeffectiveness of instrumental conditioning. This goldenretrievers obedience training will be much more effectiveif the owner rewards his dog with a treat straight afterthe desired response.

    K E Y T E R M

    Gradient of delayThe progressiveweakening of aninstrumental responseas a result ofincreasing the delaybetween thecompletion of theresponse and thedelivery of thereinforcer.

    Pearm-04.qxp 12/7/07 8:17 PM Page 98

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 99

    Responsereinforcer contingencyWe saw in Chapter 3 that the CSUS contingency isimportant for Pavlovian conditioning because learningis more effective when the US occurs only in thepresence of the CS than when the US also occurs bothin the presence and absence of the CS. An experimentby Hammond (1980) makes a similar point forinstrumental behavior, by demonstrating theimportance of the responsereinforcer contingencyfor effective conditioning. The training schedule wasquite complex and required that the experimentalsession was divided into 1-second intervals. If aresponse occurred in any interval then, for three groupsof thirsty rats, water was delivered at the end of theinterval with a probability of 0.12. The results from agroup that received only this training, and no water inthe absence of lever pressing (Group 0), are shown inthe left-hand histogram of Figure 4.4. By the end oftraining this group was responding at more than 50responses a minute. For the remaining two groups,water was delivered after some of the 1-secondintervals in which a response did not occur. For Group0.08, the probability of one of these intervals being followed by water was 0.08, whereasfor Group 0.12 this probability was 0.12. The remaining two histograms show the finalresponse rates for these two groups. Both groups responded more slowly than Group 0,but responding was weakest in the group for which water was just as likely to bedelivered whether or not a response had been made. The contingency between responseand reinforcer thus influences the rate at which the response will be performed. We nowneed to ask why this should be the case. In fact, there are two answers to this question.

    One answer is based on a quite different view of instrumental conditioning to thatconsidered thus far. According to this account, instrumental conditioning will beeffective whenever a response results in an increase in the rate of reinforcement(e.g. Baum, 1973). Thus there is no need for a response to be followed closely byreward for successful conditioning, all that is necessary is for the overall probability ofreward being delivered to increase. In other words, the contingency between a responseand reward is regarded as the critical determinant for the outcome of instrumentalconditioning. This position is referred to as a molar theory of reinforcement becauseanimals are assumed to compute the rate at which they make a response over asubstantial period of time and, at the same time, compute the rate at which reward isdelivered over the same period. If they should detect that an increase in the rate ofresponding is correlated with an increase in the rate of reward delivery, then theresponse will be performed more vigorously in the future. Moreover, the closer thecorrelation between the two rates, the more rapidly will the response be performed.Group 0 of Hammonds (1980) experiment demonstrated a high correlation betweenthe rate at which the lever was pressed and the rate at which reward was delivered, andthis molar analysis correctly predicts that rats will learn to respond rapidly on the lever.In the case of Group 0.12, however, the rate of lever pressing had some influence onthe rate at which reward was delivered, but this influence was slight because the rewardwould be delivered even if a rat refused to press the lever. In these circumstances,

    FIGURE 4.4 The mean rates of lever pressing for waterby three groups of thirsty rats in their final session oftraining. The groups differed in the probability withwhich free water was delivered during the intervalsbetween responses. Group 0 received no water duringthese intervals, Group 0.08 and Group 0.12 receivedwater with a probability of 0.08 and 0.12, respectively, atthe end of each period of 1 second in which a responsedid not occur (adapted from Hammond, 1980).

    60

    40

    20

    00 0.08 0.12

    Group

    Mea

    n re

    spon

    ses

    per m

    inut

    e

    K E Y T E R M S

    ResponsereinforcercontingencyThe degree to which the occurrence of thereinforcer depends on the instrumentalresponse. Positivecontingency: thefrequency of thereinforcer is increasedby making the response.Negative contingency:the frequency of thereinforcer is reduced bymaking the response.Zero contingency: thefrequency of thereinforcer is unaffectedby making the response.

    Molar theory ofreinforcementThe assumption thatthe rate ofinstrumentalresponding isdetermined by theresponsereinforcercontingency.

    Pearm-04.qxp 12/7/07 8:17 PM Page 99

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    100 A N I M A L L E A R N I N G & C O G N I T I O N

    responding is predicted to be slow and again the theoryis supported by the findings.

    The molar analysis of instrumental behavior hasreceived a considerable amount of attention andgenerated a considerable body of experimentalresearch, but there are good reasons for believing that itmay be incorrect. In an experiment by Thomas (1981),rats in a test chamber containing a lever were given afree pellet of food once every 20 seconds, even if theydid nothing. At the same time, if they pressed the leverduring any 20-seconds interval then the pellet wasdelivered immediately, and the pellet at the end of theinterval was cancelled. Subsequent responses duringthe remainder of the interval were without effect. Thistreatment ensured that rats received three pellets of

    food a minute whether or not they pressed the lever. Thus lever pressing in thisexperiment did not result in an increase in the rate of food delivery and, according to themolar point of view, the rate of making this response should not increase. The mean rateof responding during successive sessions is shown for one rat in Figure 4.5. Althoughthe rat took some time to press the lever, it eventually pressed at a reasonably high rate.A similar pattern of results was observed with the other rats in the experiment, whichclearly contradicts the prediction drawn from a molar analysis of instrumental behavior.

    Thomas (1981) reports a second experiment, the design of which was much thesame as for the first experiment, except that lever pressing not only resulted in theoccasional, immediate delivery of food but also in an overall reduction of food bypostponing the start of the next 20-second interval by 20 seconds. On this occasion,the effect of lever pressing was to reduce the rate at which food was delivered, yeteach of six new rats demonstrated an increase in the rate of lever pressing as trainingprogressed. The result is opposite to that predicted by a molar analysis ofinstrumental conditioning.

    Although molar theories of instrumental behavior (e.g. Baum, 1973) are ideallysuited to explaining results such as those reported by Hammond (1980), it is difficultto see how they can overcome the problem posed by the findings of Thomas (1981). Itis therefore appropriate to seek an alternative explanation for the influence of theresponsereinforcer contingency on instrumental conditioning. One alternative, whichby now should be familiar, is that instrumental conditioning depends on the formationof associations. This position is referred to as a molecular theory of reinforcementbecause it assumes that the effectiveness of instrumental conditioning depends onspecific episodes of the response being paired with a reinforcer. The results from theexperiment can be readily explained by a molecular analysis of instrumentalconditioning, because contiguity between a response and a reinforcer is regarded as theimportant condition for successful conditioning. Each lever press that resulted in foodwould allow an association involving the response to gain in strength, which wouldthen encourage more vigorous responding as training progressed.

    At first glance, Hammonds (1980) results appear to contradict a molecularanalysis because the response was paired with reward in all three groups and theywould therefore be expected to respond at a similar rate, which was not the case. Itis, however, possible to reconcile these results with a molecular analysis ofinstrumental conditioning by appealing to the effects of associative competition, asthe following section shows.

    FIGURE 4.5 The totalnumber of lever pressesrecorded in eachsession for a rat in theexperiment by Thomas(1981).

    30

    25

    20

    15

    10

    5

    00 5 10 15 20 25 30

    Session

    Mea

    n re

    spon

    ses

    per m

    inut

    e

    K E Y T E R M

    Molecular theory ofreinforcementThe assumption thatthe rate ofinstrumentalresponding isdetermined byresponsereinforcercontiguity.

    Pearm-04.qxp 12/7/07 8:17 PM Page 100

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 101

    Associative competitionIf two stimuli are presented together for Pavlovian conditioning, the strength of theconditioned response (CR) that each elicits when tested individually is often weakerthan if they are presented for conditioning separately. This overshadowing effect isexplained by assuming the two stimuli are in competition for associative strength sothat the more strength acquired by one the less is available for the other (Rescorla &Wagner, 1972). Overshadowing is normally assumed to take place between stimulibut, if it is accepted that overshadowing can also occur between stimuli andresponses that signal the same reinforcer, then it is possible for molecular theories ofinstrumental behavior to explain the contingency effects reported by Hammond(1980). In Group 0 of his study, each delivery of water would strengthen alever-presswater association and result eventually in rapid lever pressing. In theother groups, however, the delivery of free water would allow the context to enterinto an association with this reinforcer. The delivery of water after a response willthen mean that it is signaled by both the context and the response, and theories ofassociative learning predict that the contextwater association will restrict, throughovershadowing, the growth of the responsewater association. As the strength of theresponsewater association determines the rate at which the response is performed,responding will be slower when some free reinforcers accompany the instrumentaltraining than when all the reinforcers are earned. Furthermore, the more often thatwater is delivered free, the stronger will be thecontextwater association and the weaker will be theresponsewater association. Thus the pattern ofresults shown in Figure 4.4 can be explained by amolecular analysis of instrumental conditioning,providing it is assumed that responses and stimulicompete with each other for their associativestrength. The results from two different experimentslend support to this assumption.

    The first experiment directly supports the claimthat overshadowing is possible between stimuli andresponses. Pearce and Hall (1978) required rats topress a lever for food on a variable interval schedule,in which only a few responses were followed byreward. For an experimental group, each rewardedresponse was followed by a brief burst of white noisebefore the food was delivered. The noise, whichaccompanied only rewarded responses, resulted in asubstantially lower rate of lever pressing by theexperimental than by control groups that received either similar exposure to the noise(but after nonrewarded responses) or no exposure to the noise at all (Figure 4.6).Geoffrey Hall and I argued that the most plausible explanation for these findings isthat instrumental learning involves the formation of RUS associations and that thesewere weakened through overshadowing by a noisefood association that developedin the experimental group.

    The second source of support for a molecular analysis of the effect ofcontingency on instrumental responding can be found in contingency experiments inwhich a brief stimulus signals the delivery of each free reinforcer. The brief stimulusshould itself enter into an association with the reinforcer and thus overshadow the

    FIGURE 4.6 The meanrates of lever pressingby three groups of ratsthat received a burst of noise after eachrewarded response(Corr), after somenonrewarded responses(Uncorr), or no noise at all (Food alone)(adapted from Pearce &Hall, 1978).

    20

    15

    10

    5

    01 2 3 4

    Sessions

    Res

    pons

    es p

    er m

    inut

    e

    Corr

    Uncorr Food alone

    Pearm-04.qxp 12/7/07 8:17 PM Page 101

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    102 A N I M A L L E A R N I N G & C O G N I T I O N

    development of an association between the context and the reinforcer. Whenever aresponse is followed by the reinforcer it will now be able to enter a normal RUSassociation, because of the lack of competition from the context. Responding in theseconditions should thus be more vigorous than if the free US is not signaled. Insupport of this argument, both Hammond and Weinberg (1984) and Dickinson andCharnock (1985) have shown that free reinforcers disrupt instrumental responding toa greater extent when they are unsignaled than when they are signaled. Thesefindings make a particularly convincing case for the belief that competition forassociative strength is an important influence on the strength of an instrumentalresponse. They also indicate that this competition is responsible for the influence ofthe responsereinforcer contingency on the rate of instrumental responding.

    The nature of the reinforcerPerhaps the most important requirement for successful instrumental conditioning isthat the response is followed by a reinforcer. But what makes a reinforcer? In nearlyall the experiments that have been described thus far, the reinforcer has been food fora hungry animal, or water for a thirsty animal. As these stimuli are of obviousbiological importance, it is hardly surprising to discover that animals are prepared toengage in an activity such as lever pressing in order to earn them. However, this doesnot mean that a reinforcer is necessarily a stimulus that is of biological significanceto the animal. As Schwartz (1989) notes, animals will press a lever to turn on a light,and it is difficult to imagine the biological need that is satisfied on these occasions.

    Thorndike (1911) was the first to appreciate the need to identify the definingcharacteristics of a reinforcer, and his solution was contained within the Law ofEffect. He maintained that a reinforcer was a stimulus that resulted in a satisfyingstate of affairs. A satisfying state of affairs was then defined as . . . one which theanimal does nothing to avoid, often doing things which maintain or renew it(Thorndike, 1913, p. 2). In other words, Thorndike effectively proposed that astimulus would serve as a reinforcer (increase the likelihood of a response) if animalswere willing to respond in order to receive that stimulus. The circularity in thisdefinition should be obvious and has served as a valid source of criticism of the Lawof Effect on more than one occasion (e.g. Meehl, 1950). Thorndike was not alone inproviding a circular definition of a reinforcer. Skinner has been perhaps the mostblatant in this respect, as the following quotation reveals (Skinner, 1953, pp. 7273):

    The only way to tell whether or not a given event is reinforcing to a givenorganism under given conditions is to make a direct test. We observe thefrequency of a selected response, then make an event contingent upon it andobserve any change in frequency. If there is a change, we classify the event asreinforcing.

    To be fair, for practical purposes this definition is quite adequate. It provides auseful and unambiguous terminology. At the same time, once we have decided thata stimulus, such as food, is a positive reinforcer, then we can turn to a study of anumber of issues that are important to the analysis of instrumental learning. Forinstance, we have been able to study the role of the reinforcer in the associations thatare formed during instrumental learning, without worrying unduly about what it isthat makes a stimulus a reinforcer. But the definitions offered by Thorndike and

    Pearm-04.qxp 12/7/07 8:17 PM Page 102

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 103

    Skinner are not very helpful if a general statement isbeing sought about the characteristics of a stimulusthat dictate whether or not it will function as areinforcer. And the absence of such a generalstatement makes our understanding of the conditionsthat promote instrumental learning incomplete.

    A particularly elegant solution to the problem ofdeciding whether a stimulus will function as areinforcer is provided by the work of Premack (1959,1962, 1965), who put forward what is now called thePremack principle. He proposed that reinforcerswere not stimuli but opportunities to engage inbehavior. Thus the activity of eating, not the stimulusof food, should be regarded as the reinforcer when ananimal has been trained to lever press for food. Todetermine if one activity will serve as the reinforcerfor another activity, Premack proposed that theanimal should be allowed to engage freely in bothactivities. For example, a rat might be placed into achamber containing a lever and some food pellets. Ifit shows a greater willingness to eat the food than to press the lever, then we canconclude that the opportunity to eat will reinforce lever pressing, but the opportunityto lever press will not reinforce eating.

    It is perhaps natural to think of the properties of a reinforcer as being absolute.That is, if eating is an effective reinforcer for one response, such as lever pressing,then it might be expected to serve as a reinforcer for any response. But Premack(1965) has argued this assumption is unjustified. An activity will only be reinforcingif subjects would rather engage in it than in the activity that is to be reinforced. Todemonstrate this relative property of a reinforcer, Premack (1971a) placed rats into arunning wheel, similar to the one sketched in Figure 4.7, for 15 minutes a day.

    When the rats were thirsty, they preferred to drink rather than to run in the wheel,but when they were not thirsty, they preferred to run rather than to drink. For the testphase of the experiment, the wheel was locked and the rats had to lick the drinkingtube to free it and so gain the opportunity to run for 5 seconds. Running is notnormally regarded as a reinforcing activity but because rats that are not thirsty preferto run rather than drink, it follows from Premacks (1965) argument that they shouldincrease the amount they drink in the wheel in order to earn the opportunity to run.Conversely; running would not be expected to reinforce drinking for thirsty rats,because in this state of deprivation they prefer drinking to running. In clear supportof this analysis, Premack (1971a) found that running could serve as a reinforcer fordrinking, but only with rats that were not thirsty.

    As Allison (1989) has pointed out, Premacks proposals can be expressedsuccinctly by paraphrasing Thorndikes Law of Effect. For instrumental conditioningto be effective it is necessary for a response to be followed not by a satisfying stateof affairs, but by a preferred response. Despite the improvement this change affordswith respect to the problem of defining a reinforcer, experiments have shown that itdoes not account adequately for all the circumstances where one activity will serveas a reinforcer for another.

    Consider an experiment by Allison and Timberlake (1974) in which rats werefirst allowed to drink from two spouts that provided different concentrations of

    FIGURE 4.7 A sketchof the apparatus usedby Premack (1971a) todetermine if being giventhe opportunity to runcould serve as areinforcer for drinking inrats that were notthirsty (adapted fromPremack, 1971a).

    Drinkingtube

    Pneumaticcylinders

    K E Y T E R M

    Premack principleThe proposal thatactivity A will reinforceactivity B, if activity Ais more probable thanactivity B.

    Pearm-04.qxp 12/7/07 8:17 PM Page 103

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    104 A N I M A L L E A R N I N G & C O G N I T I O N

    saccharin solution. This baseline test session revealed a preference for the sweetersolution. According to Premacks proposals, therefore, rats should be willing toincrease their consumption of the weaker solution if drinking it is the only meansby which they can gain access to the sweeter solution. By contrast, rats should notbe willing to increase their consumption of the sweeter solution to gain access to theweaker one. To test this second prediction, rats were allowed to drink from the spoutsupplying the sweeter solution and, after every 10 licks, they were permitted onelick at the spout offering the less-sweet solution. This 10 : 1 ratio meant that,relative to the amount of sweet solution consumed, the rats received less of theweaker solution than they chose to consume in the baseline test session. As aconsequence of this constraint imposed by the experiment, Allison and Timberlake(1974) found that rats increased their consumption of the stronger solution. It isimportant to emphasize that this increase occurred in order to allow the rats to gainaccess to the less preferred solution, which, according to Premacks theory, shouldnot have taken place.

    Timberlake and Allison (1974) explained their results in terms of an equilibriumtheory of behavior. They argued that when an animal is able to engage in a variety ofactivities, it will have a natural tendency to allocate more time to some than others.The ideal amount of time that would be devoted to an activity is referred to as its blisspoint, and each activity is assumed to have its own bliss point. By preventing ananimal from engaging in even its least preferred activity, it will be displaced from thebliss point and do its best to restore responding to this point.

    In the experiment by Allison and Timberlake (1974), therefore, forcing thesubjects to drink much more of the strong than the weak solution meant that theywere effectively deprived of the weak solution. As the only way to overcome thisdeficit was to drink more of the sweet solution, this is what they did. Of course, asthe rats approached their bliss point for the consumption of the weak solution, theywould go beyond their bliss point for the consumption of the sweet solution. To copewith this type of conflict, animals are believed to seek a compromise, or state ofequilibrium, in which the amount of each activity they perform will lead them asclose as possible to the bliss points for all activities. Thus the rats completed theexperiment by drinking rather more than they would prefer of the strong solution,and rather less than they would prefer of the weak solution.

    By referring to bliss points, we can thus predict when the opportunity to engagein one activity will serve as a reinforcer for another activity. But this does not meanthat we have now identified completely the circumstances in which the delivery of aparticular event will function as a reinforcer. Some reinforcers do not elicit responsesthat can be analyzed usefully by equilibrium theory. Rats will press a lever to receivestimulation to certain regions of the brain, or to turn on a light, or to turn off anelectric shock to the feet. I find it difficult to envisage how any measure of baselineactivity in the presence of these events would reveal that they will serve asreinforcers for lever pressing. In the next section we will find that a stimulus that hasbeen paired with food can reinforce lever pressing in hungry rats. Again, simply byobserving an animals behavior in the presence of the stimulus, it is hard to imaginehow one could predict that the stimulus will function as a reinforcer. Ourunderstanding of the nature of a reinforcer has advanced considerably sinceThorndike proposed the Law of Effect. However, if we wish to determine withconfidence if a certain event will act as a reinforcer for a particular response, at timesthere will be no better alternative than to adopt Skinners suggestion of testing forthis property directly.

    Pearm-04.qxp 12/7/07 8:17 PM Page 104

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 105

    Conditioned reinforcementThe discussion has been concerned thus far with primary reinforcers, that is, withstimuli that do not need to be paired with another stimulus to function as reinforcersfor instrumental conditioning. There are, in addition, numerous studies that haveshown that even a neutral stimulus may serve as an instrumental reinforcer by virtueof being paired with a primary reinforcer. An experiment by Hyde (1976) provides agood example of a stimulus acting in this capacity as a conditioned reinforcer. Inthe first stage of the experiment, an experimental group of hungry rats had a numberof sessions in which the occasional delivery of food was signaled by a brief tone.A control group was treated in much the same way except that the tone and foodwere presented randomly in respect to each other. Both groups were then given theopportunity to press the lever to present the tone. The results from the eight sessionsof this testing are displayed in Figure 4.8. Even though no food was presented in thistest phase, the experimental group initially showed a considerable willingness topress the lever. The superior rate of pressing by the experimental compared to thecontrol group strongly suggests that pairing the tone with food resulted in itbecoming a conditioned reinforcer.

    In the previous experiment, the effect of the conditioned reinforcer wasrelatively short lasting, which should not be surprising because it will lose itsproperties by virtue of being presented in the absence of food. The effects ofconditioned reinforcers can be considerably more robust if their relationship withthe primary reinforcer is maintained, albeit intermittently. Experiments using tokenreinforcers provide a particularly forceful demonstration of how the influence of aconditioned reinforcer may be sustained in this way. Token reinforcers are typicallysmall plastic discs that are earned by performing some response, and once earnedthey can be exchanged for food. In an experiment by Kelleher (1958), chimpanzeeshad to press a key 125 times to receive a single token. When they had collected 50tokens they were allowed to push them all into a slot to receive food. In thisexperiment, therefore, the effect of the token reinforcers was sufficiently strong thatthey were able to reinforce a sequence of more than 6000 responses.

    FIGURE 4.8 The mean rates of lever pressing for a brief tone by two groups ofrats. For the experimental group the tone had previously been paired with food,whereas for the control group the tone and food had been presented randomly inrespect to each other (adapted from Hyde, 1976).

    150

    100

    50

    00 1 2 3 4 5 6 7 8

    Session

    Mea

    n re

    spon

    ses

    per s

    essio

    n

    Experimental

    Control

    K E Y T E R M S

    Conditioned reinforcerAn originally neutralstimulus that servesas a reinforcer throughtraining, usually bybeing paired with apositive reinforcer.

    Token reinforcerA conditionedreinforcer in the formof a plastic chip thatcan be held by thesubject.

    Pearm-04.qxp 12/7/07 8:17 PM Page 105

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    106 A N I M A L L E A R N I N G & C O G N I T I O N

    A straightforward explanation for the results of the experiment by Hyde (1976)is that the tone became an appetitive Pavlovian CS and thus effectively served as asubstitute for food. The results from experiments such as that by Kelleher (1958)have led Schwartz (1989) to argue that there are additional ways in whichconditioned reinforcers can be effective (see also Golub, 1977):

    They provide feedback that the correct response has been made. Delivering a tokenafter the completion of 125 responses would provide a useful signal that the subjectis engaged in the correct activity.

    Conditioned reinforcers might act as a cue for the next response to be performed.Kelleher (1958) observed that his chimpanzees often waited for several hoursbefore making their first response in a session. This delay was virtually eliminatedby giving the subject some tokens at the start of the session, thus indicating that thetokens acted as a cue for key pressing.

    Conditioned reinforcers may be effective because they help to counteract thedisruptive effects of imposing a long delay between a response and the delivery ofa primary reinforcer. Interestingly, as far as tokens are concerned, this property ofthe token is seen only when the chimpanzee is allowed to hold it during the delay.

    Taken together, these proposals imply that the properties of a conditionedreinforcer are considerably more complex than would be expected if they were basedsolely on its Pavlovian properties.

    THE PERFORMANCE OF INSTRUMENTA LB E H AV I O RThe experiments considered so far have been concerned with revealing theknowledge that is acquired during the course of instrumental conditioning. Theyhave also indicated some of the factors that influence the acquisition of thisknowledge. We turn our attention now to examining the factors that determine thevigor with which an animal will perform an instrumental response. We have alreadyseen that certain devaluation treatments can influence instrumental responding, andso too can manipulations designed to modify the strength of the instrumentalassociation. But there remain a number of other factors that influence instrumentalbehavior. In the discussion that follows we shall consider two of these influences insome detail: deprivation state and the presence of Pavlovian CSs.

    DeprivationThe level of food deprivation has been shown, up to a point, to be directly related tothe vigor with which an animal responds for food. This is true when the response isrunning down an alley (Cotton, 1953) or pressing a lever (Clark, 1958). To explain thisrelationship, Hull (1943) suggested that motivational effects are mediated by activityin a drive center. Drive is a central state that is excited by needs and energizesbehavior. It was proposed that the greater the level of drive, the more vigorous will bethe response that the animal is currently performing. Thus, if a rat is pressing a leverfor food, then hunger will excite drive, which, in turn, will invigorate this activity.

    Pearm-04.qxp 12/7/07 8:17 PM Page 106

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 107

    A serious shortcoming of Hulls (1943) account is the claim that drive isnonspecific, so that it can be enhanced by an increase in any need of the animal.A number of curious predictions follow from this basic aspect of his theorizing. Forexample, the pain produced by electric shock is assumed to increase drive, so that ifanimals are given shocks while lever pressing for food, they should respond morerapidly than in the absence of shock. By far the most frequent finding is that thismanipulation has the opposite effect of decreasing appetitive instrumentalresponding (e.g. Boe & Church, 1967). Conversely, the theory predicts thatenhancing drive by making animals hungrier should facilitate the rate at which theypress a lever to escape or avoid shock. Again, it should not be surprising to discoverthat generally this prediction is not confirmed. Increases in deprivation have beenfound, in this respect, to be either without effect (Misanin & Campbell, 1969) or toreduce the rate of such behavior (Meyer, Adams, & Worthen, 1969; Leander, 1973).

    In response to this problem, more recent theorists have proposed that animalspossess two drive centers: One is concerned with energizing behavior that leads toreward, the other is responsible for invigorating activity that minimizes contactwith aversive stimuli. These can be referred to, respectively, as the positive andnegative motivational systems. A number of such dual-system theories ofmotivation have been proposed (Konorski, 1967; Rescorla & Solomon, 1967;Estes, 1969).

    The assumption that there are two motivational systems rather than a single drivecenter allows these theories to overcome many of the problems encountered byHulls (1943) theory. For example, it is believed that deprivation states like hungerand thirst will increase activity only in the positive system, so that a change indeprivation should not influence the vigor of behavior that minimizes contact withaversive stimuli such as shock. Conversely, electric shock should not invigorateresponding for food as it will excite only the negative system.

    But even this characterization of the way in which deprivation states influencebehavior may be too simple. Suppose that an animal that has been trained to leverpress for food when it is hungry is satiated by being granted unrestricted access tofood before it is returned to the conditioning chamber. The account that has just beendeveloped predicts that satiating the animal will reduce the motivational support forlever pressing by lowering the activity in the positive system. The animal would thusbe expected to respond less vigorously than one that was still hungry. There is someevidence to support this prediction (e.g. Balleine, Garner, Gonzalez & Dickinson,1995), but additional findings by Balleine (1992) demonstrate that dual-systemtheories of motivation are in need of elaboration if they are to provide a completeaccount of the way in which deprivation states influence responding.

    In one experiment by Balleine (1992), two groups of rats were trained to press abar for food while they were hungry (H). For reasons that will be made evidentshortly, it is important to note that the food pellets used as the instrumental reinforcerwere different to the food that was presented at all other times in this experiment.Group HS was then satiated (S) by being allowed unrestricted access to their normalfood for 24 hours, whereas Group HH remained on the deprivation schedule.Finally, both groups were again given the opportunity to press the bar, but respondingnever resulted in the delivery of the reinforcer. Because of their different deprivationstates, dual-system theories of motivation, as well as our intuitions, predict thatGroup HH should respond more vigorously than Group HS in this test session. Butit seems that our intuitions are wrong on this occasion. The mean number ofresponses made by each group in the test session are shown in the two gray

    K E Y T E R M

    Dual-system theoriesof motivationTheories that assumethat behavior ismotivated by activityin a positive system,which energizesapproach to an object,and a negativesystem, whichenergizes withdrawalfrom an object.

    Pearm-04.qxp 12/7/07 8:17 PM Page 107

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    108 A N I M A L L E A R N I N G & C O G N I T I O N

    histograms on the left-hand side of Figure 4.9, which reveal that both groupsresponded quite vigorously, and at a similar rate.

    The equivalent histograms on the right-hand side of Figure 4.9 show the resultsof two further groups from this study, which were trained to lever press for foodwhile they were satiated by being fed unrestricted food in their home cages. Rats willlearn to respond for food in these conditions, provided that the pellets are of adifferent flavor to that of the unrestricted food presented in the home cages. GroupSS was then tested while satiated, whereas Group SH was tested while hungry.Once again, and contrary to our intuitions, both groups performed similarly in thetest session despite their different levels of deprivation. When the results of the fourgroups are compared, it is evident that the groups that were trained hungry respondedsomewhat more on the test trials than those that were trained while they weresatiated. But to labor the point, there is no indication that changing deprivation levelfor the test session had any influence on responding.

    Balleines (1992) explanation for these findings is that the incentive value, orattractiveness, of the reinforcer is an important determinant of how willing animalswill be to press for it. If an animal consumes a reinforcer while it is hungry, then thatreinforcer may well be more attractive than if it is consumed while the animal issatiated. Thus Group HH and Group HS may have responded rapidly in the testsession because they anticipated a food that in the past had proved attractive, becausethey had only eaten it while they were hungry. By way of contrast, the slowerresponding by Groups SS and SH can be attributed to them anticipating food thatin the past had not been particularly attractive, because they had eaten it only whilethey were not hungry.

    This explanation was tested with two additional groups. Prior to the experiment,animals in Group Pre(S) HS were given reward pellets while they were satiated to

    FIGURE 4.9 The mean number of responses made by six groups of rats in anextinction test session. The left-hand letter of each pair indicates the level ofdeprivation when subjects were trained to lever press for rewardeither satiated (S)or hungry (H)the right-hand letter indicates the deprivation level during test trials.Two of the groups were allowed to consume the reward either satiated, Pre(S), orhungry, Pre(H), prior to instrumental conditioning (adapted from Balleine, 1992).

    200

    150

    100

    50

    0HH HS Pre(S)HS

    SS SH Pre(H)SHGroup Group

    Mea

    n to

    tal r

    espo

    nses

    200

    150

    100

    50

    0

    Mea

    n to

    tal r

    espo

    nses

    Pearm-04.qxp 12/7/07 8:17 PM Page 108

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 109

    demonstrate that the pellets are not particularly attractive in this deprivation state.The group was then trained to lever press while hungry and received test trials whilesatiated. On the test trials, the subjects should know that because, of their low levelof deprivation, the reward pellets are no longer attractive and they should be reluctantto press the lever. The results, which are shown in the blue histogram in the left-handside of Figure 4.9, confirmed this prediction. The final group to be considered, GroupPre(H) SH, was first allowed to eat reward pellets in the home cage while hungry,instrumental conditioning was then conducted while the group was satiated and thetest trials were conducted while the group was hungry. In contrast to Group SH andGroup SS, this group should appreciate that the reward pellets are attractive whilehungry and respond more rapidly than the other two groups during the test trials.Once again, the results confirmed this predictionsee the blue histogram on theright-hand side of Figure 4.9

    By now it should be evident that no simple conclusion can be drawn concerningthe way in which deprivation states influence the vigor of instrumental responding.On some occasions a change in deprivation state is able to modify directly the rateof responding, as dual-systems theories of motivation predict. On other occasions,this influence is more indirect by modifying the attractiveness of the reinforcer. Aninformative account of the way in which these findings may be integrated can befound in Balleine et al. (1995).

    Pavlovianinstrumental interactionsFor a long time, theorists have been interested in the way in which Pavlovian CSsinfluence the strength of instrumental responses that are performed in theirpresence. One reason for this interest is that Pavlovian and instrumentalconditioning are regarded as two fundamental learning processes, and it isimportant to appreciate the way in which they work together to determine how ananimal behaves. A second reason was mentioned at the end of Chapter 2, where wesaw that Pavlovian CSs tend to elicit reflexive responses that may not always be inthe best interests of the animal. If a Pavlovian CS was also able to modulate thevigor of instrumental responding, then this would allow it to have a more general,and more flexible, influence on behavior than has so far been implied. For example,if a CS for food were to invigorate instrumental responses that normally lead tofood, then such responses would be strongest at a time when they are most needed,that is, in a context where food is likely to occur. The experiments described in thissection show that Pavlovian stimuli can modulate the strength of instrumentalresponding. They also show that there are at least two ways in which this influencetakes place.

    Motivational influencesKonorski (1967), it should be recalled from Chapter 2, believed that a CS can excitean affective representation of the US that was responsible for arousing a preparatoryCR. He further believed that a component of this CR consists of a change in the levelof activity in a motivational system. A CS for food, say, was said to increase activityin the positive motivational system, whereas a CS for shock should excite thenegative system. If these proposals are correct, then it should be possible to alter thestrength of instrumental responding by presenting the appropriate Pavlovian CS(see also Rescorla & Solomon, 1967).

    Pearm-04.qxp 12/7/07 8:17 PM Page 109

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    110 A N I M A L L E A R N I N G & C O G N I T I O N

    An experiment by Lovibond (1983), using Pavlovianinstrumental transferdesign, provides good support for this prediction. Hungry rabbits were first trainedto operate a lever with their snouts to receive a squirt of sucrose into the mouth. Thelevers were then withdrawn for a number of sessions of Pavlovian conditioning inwhich a clicker that lasted for 10 seconds signaled the delivery of sucrose. In a finaltest stage, subjects were again able to press the lever and, as they were doing so, theclicker was occasionally operated. The effect of this appetitive CS was to increasethe rate of lever pressing both during its presence and for a short while after it wasturned off. A similar effect has also been reported in a study using an aversive US.Rescorla and LoLordo (1965) found that the presentation of a CS previously pairedwith shock enhanced the rate at which dogs responded to avoid shock.

    In addition to explaining the findings that have just been described, a furtheradvantage of dual-system theories of motivation is that they are able to account formany of the effects of exposing animals simultaneously to both appetitive and aversivestimuli. For example, an animal may be exposed to one stimulus that signals rewardand another indicating danger. In these circumstances, instead of the two systemsworking independently, they are assumed to be connected by mutually inhibitory links,so that activity in one will inhibit the other (Dickinson & Pearce, 1977).

    To understand this relationship, consider the effect of presenting a signal forshock to a rat while it is lever pressing for food. Prior to the signal, the level ofactivity in the positive system will be solely responsible for the rate of pressing.When the aversive CS is presented, it will arouse the negative system. The existenceof the inhibitory link will then allow the negative system to suppress activity in thepositive system and weaken instrumental responding. As soon as the aversive CS isturned off, the inhibition will be removed and the original response rate restored. Byassuming the existence of inhibitory links, dual-system theories can provide a verysimple explanation for conditioned suppression. It occurs because the aversive CSreduces the positive motivational support for the instrumental response.

    Response-cueing properties of Pavlovian CRsIn addition to modulating activity in motivational systems, Pavlovian stimuli caninfluence instrumental responding through a response-cueing process (Trapold &Overmier, 1972). To demonstrate this point we shall consider an experiment byColwill and Rescorla (1988), which is very similar in design to an earlier study byKruse, Overmier, Konz, and Rokke (1983).

    In the first stage of the experiment, hungry rats received Pavlovian conditioningin which US1 was occasionally delivered during a 30-second CS. Training was thengiven, in separate sessions, in which R1 produced US1 and R2 produced US2. Thetwo responses were chain pulling and lever pressing, and the two reinforcers werefood pellets and sucrose solution. For the test stage, animals had the opportunity forthe first time to perform R1 and R2 in the presence of the CS, but neither response ledto a reinforcer. As Figure 4.10 shows, R1 was performed more vigorously than R2.

    The first point to note is that it is not possible to explain these findings byappealing to the motivational properties of the CS. The CS should, of course,enhance the level of activity in the positive system. But this increase in activityshould then invigorate R1 to exactly the same extent as R2 because the motivationalsupport for both responses will be provided by the same, positive, system.

    In developing an alternative explanation for the findings by Colwill and Rescorla(1988), note that instrumental conditioning with the two responses was conducted in

    K E Y T E R M

    PavlovianinstrumentaltransferTraining in which a CSis paired with a USand then the CS ispresented while thesubject is performingan instrumentalresponse.

    Pearm-04.qxp 12/7/07 8:17 PM Page 110

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 111

    separate sessions. Thus R1 was acquired against abackground of presentations of US1 and, likewise,R2 was acquired against a background of US2presentations. If we now accept that the trainingresulted in the development of S-R associations, it isconceivable that certain properties of the two rewardscontributed towards the S component of theseassociations. For example, a memory of US1 mightcontribute to the set of stimuli that are responsible foreliciting R1. When the CS was presented for testing,it should activate a memory of US1, which in turnshould elicit R1 rather than R2. In other words, thePavlovian CS was able to invigorate the instrumentalresponse by providing cues that had previouslybecome associated with the instrumental response.

    Concluding commentsThe research reviewed so far in this chapter shows that we have discovered aconsiderable amount about the associations that are formed during instrumentalconditioning. We have also discovered a great deal about the factors that influencethe strength of instrumental responding. In Chapter 2 a simple memory model wasdeveloped to show how the associations formed during Pavlovian conditioninginfluence responding. It would be helpful if a similar model could be developed forinstrumental conditioning, but this may not be an easy task. We would need to takeaccount of three different associations that have been shown to be involved ininstrumental behavior, SR, RUS, S(RUS). We would also need to take accountof the motivational and response-cueing properties of any Pavlovian CSUSassociations that may develop. Finally, the model would need to explain how changesin deprivation can influence responding. It hardly needs to be said that any modelthat is able to take account of all these factors satisfactorily will be complex andwould not fit comfortably into an introductory text. The interested reader is, however,referred to Dickinson (1994) who shows how much of our knowledge aboutinstrumental behavior can be explained by what he calls an associative-cyberneticmodel. In essence, this model is a more complex version of the dual-system theoriesof motivation that we have considered. The reader might also wish to consultBalleine (2001) for a more recent account of the influence of motivational processeson instrumental behavior.

    Our discussion of the basic processes of instrumental conditioning is nowcomplete, but there is one final topic to consider in this chapter. That is, whether theprinciples we have considered can provide a satisfactory account for the problemsolving abilities of animals.

    T H E L AW O F E F F E C T A N D P R O B L E MSOLV INGAnimals can be said to have solved a problem whenever they overcome an obstacleto attain a goal. The problem may be artificial, such as having to press a lever for

    FIGURE 4.10 The meanrates of performing tworesponses, R1 and R2,in the presence of anestablished Pavlovianconditioned stimulus(CS). Prior to testing,instrumentalconditioning had beengiven in which thereinforcer for R1 wasthe same as thePavlovian unconditionedstimulus (US), and thereinforcer for R2 wasdifferent to thePavlovian US. Testingwas conducted in theabsence of anyreinforcers in a singlesession (adapted fromColwill & Rescorla,1988).

    10

    8

    6

    4

    2

    0

    Blocks of two trials

    Mea

    n re

    spon

    ses

    per m

    inut

    e

    R2

    R1

    0 1 2 3 4 5

    Pearm-04.qxp 12/7/07 8:17 PM Page 111

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    112 A N I M A L L E A R N I N G & C O G N I T I O N

    reward, or it might be one that occurs naturally, such as having to locate a new sourceof food. Early studies of problem solving in animals were conducted by means ofcollecting anecdotes, but this unsatisfactory method was soon replaced byexperimental tests in the laboratory (see Chapter 1). As a result of his experiments,Thorndike (1911) argued that despite the range of potential problems that canconfront an animal, they are all solved in the same manner. Animals are assumed tobehave randomly until by trial and error the correct response is made and reward isforthcoming. To capture this idea, Thorndike (1911) proposed the Law of Effect,which stipulates that one effect of reward is to strengthen the accidentally occurringresponse and to make its occurrence more likely in the future. This account mayexplain adequately the way cats learn to escape from puzzle boxes, but is it suitablefor all aspects of problem solving? A number of researchers have argued that animalsare more sophisticated at solving problems than is implied by the Law of Effect. It hasbeen suggested that they are able to solve problems through insight. It has also beensuggested that animals can solve problems because they have an understanding of thecausal properties of the objects in their environment or, as it is sometimes described,an understanding of folk physics. We shall consider each of these possibilities.

    InsightAn early objector to Thorndikes (1911) account of problem solving was Kohler(1925). Thorndikes experiments were so restrictive, he argued, that they preventedanimals from revealing their capacity to solve problems by any means other than the

    most simple. Kohler spent the First World War on theCanary Islands, where he conducted a number ofstudies that were meant to reveal sophisticatedintellectual processes in animals. He is best knownfor experiments that, he claimed, demonstrate theimportance of insight in problem solving. Many ofhis findings are described in his book The mentalityof apes, which documents some remarkable feats ofproblem solving by chimpanzees and other animals.Two examples should be sufficient to give anindication of his methodology. These exampleinvolve Sultan (Figure 4.11), whom Kohler (1925)regarded as the brightest of his chimpanzees. On oneoccasion Sultan, was in a cage in which there wasalso a small stick. Outside the cage was a longerstick, which was beyond Sultans reach, and evenfurther away was a reward of fruit (p. 151):

    Sultan tries to reach the fruit with the smaller ofthe sticks. Not succeeding, he tries a piece ofwire that projects from the netting in his cage,but that, too, is in vain. Then he gazes about him(there are always in the course of these testssome long pauses, during which the animalscrutinizes the whole visible area). He suddenlypicks up the little stick once more,

    K E Y T E R M

    InsightAn abrupt change inbehavior that leads toa problem beingsolved. The change inbehavior is sometimesattributed to a periodof thought followed bya flash of inspiration.

    FIGURE 4.11 Sultan stacking boxes in an attempt toreach a banana (drawing based on Kohler, 1956).

    Pearm-04.qxp 12/7/07 8:17 PM Page 112

  • Copyright 2008 Psychology Press http://www.psypress.com/animal-learning-and-cognition/

    4 Instrumental conditioning 113

    goes to the bars directly opposite to the long stick, scratches it towards himwith the auxiliary, seizes it and goes with it to the point opposite the objectivewhich he secures. From the moment that his eyes fell upon the long stick, hisprocedure forms one consecutive whole.

    In the other study, Kohler (1925) hung a piece of fruit from the ceiling of a cagehousing six apes, including Sultan. There was a wooden box in the cage (p. 41):

    All six apes vainly endeavored to reach the fruit by leaping up from the ground.Sultan soon relinquished this attempt, paced restlessly up and down, suddenlystood still in front of the box, seized it, tipped it hastily straight towards theobjective, but began to climb upon it at a (horizontal) distance of 12 meter andspringing upwards with all his force, tore down the banana.

    In both examples there is a period when the animal responds incorrectly; this isthen followed by activity that, as it is reported, suggests that the solution to theproblem has suddenly occurred to the subject. There is certainly no hint in thesereports that the problem was solved by trial and error. Does this mean, then, thatKohler (1925) was correct in his criticism of Thorndikes (1911) theorizing?

    A problem with interpreting Kohlers (1925) findings is that all of the apes hadplayed with boxes and sticks prior to the studies just described. The absence oftrial-and-error responding may thus have been due to the previous experience of theanimals. Sultan may, by accident, have learned about the consequences of jumpingfrom boxes in earlier sessions, and he was perhaps doing no more than acting on thebasis of his previous trial-and-error learning. This criticism of Kohlers (1925) workis by no means original. Birch (1945) and Schiller (1952) have both suggested thatwithout prior experience with sticks and so forth, there is very little reason f