operant conditioning. do now write two classical conditioning equations. one should use counter...

Operant Conditioning

Do Now

Write two classical conditioning equations. One should use counter conditioning

S=>Rlever-push when light flashes = cocaine injection

The Law of Effect

• Definition: Learning in which the consequence of a behavior affects the likelihood that the individual will engage in that behavior again

• First discussed by Thorndike (“law of effect”, 1898), advanced by Skinner (late 1930s – 1960s)

• Explains shaping process

Shaping

• Shaping must occur to get animal to interact with operant• Animal is rewarded gradually for interest in the operant (such as the lever)

Operant Conditioning Terms (B.F. Skinner)• Operant: any behavior that has some effect on the environment

• Reinforcement contingency: A consistent relationship between a behavior and the change in the environment it produces

• Reinforcer: any consequence (change in the environment) that increases the frequency of a behavior

• Punisher: any consequence (change in the environment) that decreases the frequency of a behavior

• Discriminative Stimulus: the cue the lets you know that the reinforcing contingency is present

• Shaping: closer approximation to the desired response are reinforced until the desired response finally occurs and can be reinforced

Positive and negative consequences

• Positive = adding something• Negative = removing something

Positive Reinforcer = when a behavior is followed by the adding of a stimulus that increases the probability of that behavior being repeated.

Negative Reinforcer = when a behavior is followed by the removal of a stimulus and therefore increases the probability of that behavior being repeated.

Positive Punishment = when a behavior is followed by the adding of a stimulus that decreases the probability of that behavior being repeated.

Negative Punishment = when a behavior is followed by the removal of a stimulus and therefore decreases the probability of that behavior being repeated.

Appetitive Stimulus

Add INCREASE behavior

POSITIVE REINFORCEMENT

Remove DECREASE behavior

NEGATIVE PUNISHMENT

Aversive Stimulus

DECREASE behavior

POSITIVE PUNISHMENT

INCREASE behavior

NEGATIVE REINFORCEMENT

Reinforcement Schedules

Three main distinctions:

- Partial vs. continuous

- Partial broken down into…

- Interval vs. ratio - Fixed vs. variable

Interval is a time-based schedule - Fixed Interval: rewarded for 1st

operant after a set period of time (e.g., every 5 seconds)- EX: salary every 2 weeks

- Variable Interval: rewarded for 1st operant after a varying amount of time (e.g., between 1 and 9 seconds, but 5 on average)- EX: salary monthly or weekly at

different times

Ratio is a number-of-operants-based

schedule - Fixed Ratio: rewarded for 1st

operant after a set number of operants (e.g., every 5th response) - Reward for every 5 level pulls

- Variable Ratio: rewarded for 1st operant after an varying number of operants (e.g., between 1 and 9 seconds, but 5 on average)- Reward for either every 5 or 9

level pulls, random variation

Effectiveness of different schedules of reinforcement

Schedule Rate of response

Resistance to extinction

Continuous Moderate Low

Variable ratio High High

Fixed ratio High Moderate-Low

Variable interval

Moderate High

Fixed interval Low Low

Operant Conditioning Can Happen Without Conscious Awareness

• Subjects listened to music with superimposed static– A twitch of the thumb would deactivate static

• Almost all began to respond with a thumb twitch even though none realized how they were able to shut it off

– One person claimed he was aware, saying that it involved “subtle rowing movements with both hands, infinitesimal wriggles of both ankles, a slight displacement of the jaw to the left, breathing out, and then waiting”

• This process is common to the method for learning skills and mastering fine-tuned practices

Dopamine is the Neural Mechanism for S=>R

• Dopamine active when reward is provided but once animal learns operant behavior, than dopamine is released when discriminative stimulus is presented

QuickTime™ and a decompressor

are needed to see this picture.

Learned Helplessness• Benefits to operant conditioning

– If you can learn the system, you can use/abuse the system

• Negative consequences of operant conditioning

– If the system seems random…– If you have no control…– If punishment/negative reinforcement

is inevitable…• depression, withdrawal

– EX: dogs shocked randomly, eventually stop attempting to move

– EX: Mussulmen in concentration camps (starvation)

• Failure destroys sense of agency/control

– Self-perpetuating cycle

A problem for operant conditioning: Unintentional elimination of desired

behaviors

• “Oversufficient justification” hypothesis– Intrinsic interest in activity may be

undermined by providing extrinsic reward

• Rationale for the hypothesis– The person might infer that his/her actions

were motivated by the external reward, not the activity itself

Preschoolers (3-5-year-olds)

• Assess interest in drawing before begin study, match groups for interest

• IV: Award Condition– Expected Award– Unexpected Award– No Award

• DV: Interest in drawing with magic markers after award received (or not received). Operationally defined as the percent of free-choice time spent drawing with the markers.

Results• Mean percent of free-choice time spent

drawing with magic markers after received or didn’t receive an award

Condition Mean PercentExpected Award 8.6Unexpected Award 16.7No Award 18.1

Implications: Should we never reward children?– Extrinsic rewards okay for behaviors that aren’t

intrinsically rewarding – Tangible awards should be just large enough to

encourage activity – decrease in intrinsic motivation inversely related to extrinsic reward

– Intangible rewards (verbal praise) probably okay all the time

– Extension to punishment: Power assertive punishment is BAD – kids reason that they only behaved appropriately to avoid punishment

Rewards and

Intelligence

• Study (Dweck et al., 2006)– Students ask to complete moderately-difficult task– IV: praise type (praise intelligence, praise effort, no praise) after

task– DV: willingness to try new problem– Results: kids who were praised based on intelligence refused to

take new test while kids praised on effort tried it– Interpretation: “you are smart now, don’t blow it”– Interpretation: “not all praise is equal”

Token Economies

• Reward for behavior or performance– Originated in prisons– Grades– Video Games

Summary

SR

operant conditioning. do now write two classical conditioning equations. one should use counter...

Documents