operant conditioning. do now write two classical conditioning equations. one should use counter...
TRANSCRIPT
Operant Conditioning
Do Now
Write two classical conditioning equations. One should use counter conditioning
S=>Rlever-push when light flashes = cocaine injection
The Law of Effect
• Definition: Learning in which the consequence of a behavior affects the likelihood that the individual will engage in that behavior again
• First discussed by Thorndike (“law of effect”, 1898), advanced by Skinner (late 1930s – 1960s)
• Explains shaping process
Shaping
• Shaping must occur to get animal to interact with operant• Animal is rewarded gradually for interest in the operant (such as the lever)
Operant Conditioning Terms (B.F. Skinner)• Operant: any behavior that has some effect on the environment
• Reinforcement contingency: A consistent relationship between a behavior and the change in the environment it produces
• Reinforcer: any consequence (change in the environment) that increases the frequency of a behavior
• Punisher: any consequence (change in the environment) that decreases the frequency of a behavior
• Discriminative Stimulus: the cue the lets you know that the reinforcing contingency is present
• Shaping: closer approximation to the desired response are reinforced until the desired response finally occurs and can be reinforced
Positive and negative consequences
• Positive = adding something• Negative = removing something
Positive Reinforcer = when a behavior is followed by the adding of a stimulus that increases the probability of that behavior being repeated.
Negative Reinforcer = when a behavior is followed by the removal of a stimulus and therefore increases the probability of that behavior being repeated.
Positive Punishment = when a behavior is followed by the adding of a stimulus that decreases the probability of that behavior being repeated.
Negative Punishment = when a behavior is followed by the removal of a stimulus and therefore decreases the probability of that behavior being repeated.
Appetitive Stimulus
Add INCREASE behavior
POSITIVE REINFORCEMENT
Remove DECREASE behavior
NEGATIVE PUNISHMENT
Aversive Stimulus
DECREASE behavior
POSITIVE PUNISHMENT
INCREASE behavior
NEGATIVE REINFORCEMENT
Reinforcement Schedules
Three main distinctions:
- Partial vs. continuous
- Partial broken down into…
- Interval vs. ratio - Fixed vs. variable
Interval is a time-based schedule - Fixed Interval: rewarded for 1st
operant after a set period of time (e.g., every 5 seconds)- EX: salary every 2 weeks
- Variable Interval: rewarded for 1st operant after a varying amount of time (e.g., between 1 and 9 seconds, but 5 on average)- EX: salary monthly or weekly at
different times
Ratio is a number-of-operants-based
schedule - Fixed Ratio: rewarded for 1st
operant after a set number of operants (e.g., every 5th response) - Reward for every 5 level pulls
- Variable Ratio: rewarded for 1st operant after an varying number of operants (e.g., between 1 and 9 seconds, but 5 on average)- Reward for either every 5 or 9
level pulls, random variation
Effectiveness of different schedules of reinforcement
Schedule Rate of response
Resistance to extinction
Continuous Moderate Low
Variable ratio High High
Fixed ratio High Moderate-Low
Variable interval
Moderate High
Fixed interval Low Low
Operant Conditioning Can Happen Without Conscious Awareness
• Subjects listened to music with superimposed static– A twitch of the thumb would deactivate static
• Almost all began to respond with a thumb twitch even though none realized how they were able to shut it off
– One person claimed he was aware, saying that it involved “subtle rowing movements with both hands, infinitesimal wriggles of both ankles, a slight displacement of the jaw to the left, breathing out, and then waiting”
• This process is common to the method for learning skills and mastering fine-tuned practices
Dopamine is the Neural Mechanism for S=>R
• Dopamine active when reward is provided but once animal learns operant behavior, than dopamine is released when discriminative stimulus is presented
QuickTime™ and a decompressor
are needed to see this picture.
Learned Helplessness• Benefits to operant conditioning
– If you can learn the system, you can use/abuse the system
• Negative consequences of operant conditioning
– If the system seems random…– If you have no control…– If punishment/negative reinforcement
is inevitable…• depression, withdrawal
– EX: dogs shocked randomly, eventually stop attempting to move
– EX: Mussulmen in concentration camps (starvation)
• Failure destroys sense of agency/control
– Self-perpetuating cycle
A problem for operant conditioning: Unintentional elimination of desired
behaviors
• “Oversufficient justification” hypothesis– Intrinsic interest in activity may be
undermined by providing extrinsic reward
• Rationale for the hypothesis– The person might infer that his/her actions
were motivated by the external reward, not the activity itself
Preschoolers (3-5-year-olds)
• Assess interest in drawing before begin study, match groups for interest
• IV: Award Condition– Expected Award– Unexpected Award– No Award
• DV: Interest in drawing with magic markers after award received (or not received). Operationally defined as the percent of free-choice time spent drawing with the markers.
Results• Mean percent of free-choice time spent
drawing with magic markers after received or didn’t receive an award
Condition Mean PercentExpected Award 8.6Unexpected Award 16.7No Award 18.1
Implications: Should we never reward children?– Extrinsic rewards okay for behaviors that aren’t
intrinsically rewarding – Tangible awards should be just large enough to
encourage activity – decrease in intrinsic motivation inversely related to extrinsic reward
– Intangible rewards (verbal praise) probably okay all the time
– Extension to punishment: Power assertive punishment is BAD – kids reason that they only behaved appropriately to avoid punishment
Rewards and
Intelligence
• Study (Dweck et al., 2006)– Students ask to complete moderately-difficult task– IV: praise type (praise intelligence, praise effort, no praise) after
task– DV: willingness to try new problem– Results: kids who were praised based on intelligence refused to
take new test while kids praised on effort tried it– Interpretation: “you are smart now, don’t blow it”– Interpretation: “not all praise is equal”
Token Economies
• Reward for behavior or performance– Originated in prisons– Grades– Video Games
Summary
SR