probility learning by roooo.pdf

Upload: rohit-rocky

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 probility learning by roooo.pdf

    1/2

    Probability earning of rats inexperiments

    AbstractProbabil ity matching behavior in time was shown byrats under two non-contingent reinforcement schedules

    .50 and .625) with correction plus shock procedure butnot with non-correction. Sequential data indicated thatthe maximum effect of reinforce ment occurred about 6sec . later and that only two preceding reinforcement swere ffectiven etermining esponse robability.Problem

    During hepast enyears herehavebeenmanystudies on probability learning in humans, other mam -mals, nd nimalsower on the hylogenetic caleCalfee, 1963; Es tes, 1964). The prim e interestinmu ch

    of this research has been whether, for the given speciesand experimental conditions, probability matching oc-curs. In the case of the rat, it has proved difficult tofind onditionswhere lear vidence of probabil itymatching is obtained for individual subjects duringlong-term experi ments Calfe e, 1963; Uhl, 1963).

    The present paper presents results for probabilitylearning with rats in an experimental setting not pre-viously used for this purpose, a symmetric shuttlebox.In continuous-timeexperiments we consider he r e-s p o n s e-s t a t e of the organ ism a t all t imes, ratherthandiscretelyemitted esponses.Thus, hesubjectis always in a response-state, for it must be on eitherthe left o r the right side of the shuttlebox. In or der todetermine whether probability matching could be ob-tained, we varied experimental conditions and examinedmean asymptotic response probabilities for individualsubjects under each condition.

    In addition, he valuation of seque ntia l tati stic swas expected to provide basic information about rein-forcement. Would the effects f reinforcement be clearlyobservable?Specifically,wh e ndoes hemaximumeffect of a reinforcement occur in the interval followingits deliv ery? Are the effec ts f reinforcement account-able simply in terms of the last reinforcement, o r a r ethere long-term effects reachingback throughprecedingreinforcements ? Finally, if sequential dependencies areshown, ar e negative recency effects also obtained, as nstudies of human probability learning?

    In the present paper we are primarily interested inpresenting experimental results; thus o efforthasbeenmade to provide a quantitive analysis of the data fromthestandpoint of a mathematical heory of learning,although uch nalysis is now inprogress ndwillbe reported later..,lethod

    Subjectswere ixadultmale ats on fooddeprivation-at 80normal weight. The apparatus was a standard shuttlebox 30 in ongx 5 in wide x 6 1/4 inhigh) with a grid floor and a low ba rri er 1 1/4

    Eileen B Karsh, DREXEL IXSTITCTE OF TECHYOLOGYPatrick Suppes, STASFORD U.VII/ERSITY

    i n in the center. Food cups were mounted on both ends of the box.The shuttlebox was dimly lluminated fro m above and was enclosedin a sound resi stan t outer box, which contained a viewing panel forobservation. Shock was delivered from a matched-impedance sourcewith 150 K ser ies res is tor .

    Rats were first magazin e-trained to take 45mg food pellets fromboth feeders.Therewas no difficulty ngettinganimals omovefreel y acros s he box.

    Throughout the experiment, a fixed interval outcome presentationwas used. Every 20 sec . a light came on briefly 1/2 sec.) from be-low the grid, on one side of the box. Two hundred such presentationswere given during a daily session, which asted slightly more han1 hr. These events were programm ed in random series on a apereader. The location of the ra t was recorded throughout the session,and a summary measure, he proportion of time spent on the eftAl) side, was recorded for each session.

    Initially a non-correction procedure NC ) was used. f the rat wason the side where he ight came on , a pellet was delivered; f herat was not on the side where the light appeared, no reinforcementwas iven. Later heprocedurewas hanged o orrectionplusshock C S). Under hisprocedure, if S was on thewrongsidewhen the ight came on, t received a fairly weak shock 2 sec. at50 volts). By crossin g over to he other side before he next ightthe shock.S could get a pellet and, by crossing rap idly, could also escape from

    A number of different conditions were used consec-utively for all Ss, so that each rat was run daily overaper iod of about 7 months.With henon-correctionprocedure the proportion of E1 events light on the leftside),*, was ini tially .625 for two r at s and .375 for tworat s. Next, the more frequent side was reversed n =. 37 5 o r ,6 25 nd then the schedule was changed ton = 50. Two additional rats were started under hiscondition n = 5 O NC . Then hecorrectionplusshock procedure was adopted, with n = .5O andshockat 50 volts. Shock was hen ncreased o 55 volts butlater changed back o 50 volts, with n = .50. Finally,the chedulewas hanged o n = .625 o r . 3 7 5 ) ,with shock at 50 volts.M e a n Asymptotic Results

    After 15-20 daysunde r the first conditi on s = . 6 2 5o p . 375 NC Ss showed bsorption,. e., spentalmost all their time during each session on the morefrequent ly reinforc ed side. When the schedul e was re-versed o n = . 3 75 o r .b25NC,Ssabsorbedontheothersideaft er 10-15 days. When the chedulewaschanged o n = .5O NC shownon he left inFig. I) ,two of the four rat s No. 151 and No. 156) continued toshow decided position preferences although t was nolonger gainful. The other two rats of the original groupNo. 150 and No. 16 4) and he two later ones NO. 170and No. 171) did not deviate much from .50, consider-ingperformance orall 25-35 days .However, herewas elatively argeday-to-dayvariabil ity. With theinstitution of the more powerful correction plus shock

    Psychn Sci., l , Vol 1 361

    Psychonomic Science, 1, 1964, pp. 361-362.

  • 8/14/2019 probility learning by roooo.pdf

    2/2

    NON-CORR ECTION CORRECTION SHOCKT i = 50 n. 50 n= s 1 3 7 5 1

    NO,164 zO Y S

    Fig. 1. Mean proportion of time in A 1 response-stateduring daily sess ions for individual Ss under 3 condi-tions: n = . 50 NC, n = 5O C S , a n d n =.625 . 3 7 5 ) c t s.

    procedure,al l ix at s showed probability-matchingfor he 20-25 days i. e. 4000-5000 outcomeevents),under m . 50 with shockat 50 volts . A decrease invariability was also evident, a s shown in the center ofFig. 1. When shock was increased to 55 volts, two ra tsNo. 150 and No. 164) began to show position prefer-ences which persisted throughout the remainder of theexperiment. The otherour ra ts continued to probability-match throughout the C S conditions with n = . 50and also with n = . 625 , 3 7 5 ) , a s shown on theright inFig. 1. Thus all six rats showed probability-matching under the .50 sched ule and four showed goodprobability mat ching under t he .625 schedu le f or a totalof about 100 consecutive days or 20,000 outcome events.Sequential Statistics

    Because outcomes occurred at fixed ntervals of 20 sec., t wasrelativel y easy to examine sequential statistics. Data were analyzedfor ll ix ats or 15 days nderhe .50 schedule (50 volts).

    Figure 2 shows the probability of an Al left) response , given thatan El outcome light on left) occurre d; thefigure also shows a curvesimilarlyconditionalized on an E2 outcome.Eachcurvegives hemean probability that S was in an Al response-state a s a function ofseconds following the l as t outcome E1 o r Ea).

    It i s clea r that he effects of reinforcemen t are very strong. Thecomplete separation of the two response curve s, following differ entoutcomes, is about as gr eat as one could expect.

    SECONDS FOLLOWING LAST OUTCOME

    Fig. 2. Mean probability of an A1 response-state afteran E1 o r anE2outcome, a s a unction of secondsfollowing the last outcome.

    SECONDS FOLWWINO LABT OUTCOME

    Fig. 3. Mean probability of anA l response-state aftertwo El outcomes El, El) o r an E 1 preceded by an E2

    EZ , E l ) as a unction of seconds following the astoutcome.

    It is interesting to note that the maximum effectsof reinforcementdo not occur until 6 sec. after he outcome is presented. Naturallyfollowing fromsuchanobservation s heque stio n of apossiblevariation n his emporal ag when additional sequential statisticsa re considered. In Fig.3 heconditional esponsecurves or twoprecedi ng outc omes a re shown. In the top curve the maximum effectof reinforcement occurs after sec. and in the second one aftersec.The results shown here are co rre lat ed with hose shown inFig.the consideration of more conditional statistics doesnot lead to muchregarding the time of maximum effect, but the essential point is thatspre ad in the distribution of maxima. In fact, if we look as far backas four preceding outcomes, the rangef occurrence of the maximumis between 4 and 8 sec. e.g. , i t is 4 sec. for the sequence El, EZ, E2

    and 8 sec. for E2. E 2 , E l , EZ). The rath er long emporal agsfollowing the reinforcing outcome are surprising.A s f a r as we know,this kind of measurement of the time it takes for reinforcement toshow it s maximum effect has not been reported proviously.

    Figure 3 also shows evidenc e of sequential dependencies, becauseof theclearseparatio n of the esponsecurvesconditionalized onE ~ , outcomes and E2,El utcomes.Theast utcomewashesam e E1) in both cas es, but the effectof the preceding outcome madeaconsiderabledifference.Thiseffectwas not observe d or hreeprecedin g outcom es; in fact, the e ffectof the th ird precedin g utComewas very slight indeed. Finally, consideringthe 16possible combina-tions of four preceding outcomes, very ittle effect of the earliestoutcome was noticeable. In par ticula r, there was no tendency at allfor a negative recency effect to appear.

    In conclusion, it may be remarked to those familar with stimulussamplingmodels hat hecomparisonbetween he El , E1 and Ez.El curves suggests that the estimate of the conditioning parameterc will be somewhere in the neighborhoodof .8. The differences in thethan 1 show, of cour se, that c will not be precisely 1. More detailedmaxima of these two curves and the fac t that both maxima are lessquestions about he estimation of p ara met ers and he fit of modelswill be delayed for a second paper.ReferencesCALFEE, R. C.Long-termbehavior of ra ts underprobabllistic

    reinforcement schedules. Tech. Rep. No. 59, Institutefor Mathema-ESTES, W. K . ProbabilityLearning. In A . W. Melton Ed.), C at e -

    tical tudiesnhe ocial ciences, tanfordUniver., 963.g o r i e s o f H u m a n L e a r n i n g . New York: Academic Press,1964.

    UHL, C . N . Two-choice probability learning in the rat as a functionof incentive, probability of reinforcement, and training procedure.I ex p Psychol. 1963, 6 6 , 443-449.

    Notes1. This research was supporte d by contract AF 49 6 3 8 ) - 1253 andgran t AF AFOSR 62384 between he Air Fo rce Office of ScientificRes ear ch and Stanford University and by gra nt MHO8419 fr om thenational Institute of Mental Health to Drexel.2. We wish to thank Dr. Saul Stern berg for use of the apparatus. Wealso thank J. Seilerfor CircuitdesignandA. Kirschenstein,R. Franksand M. Deutsch for experimental assistance.

    36 Psyehon. Sci. 964 Vol 1