robotica lezione 13. lecture outline what is learning? what is adaptation? why robot should...

37
Robotica Lezione 13

Upload: valerie-perry

Post on 10-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Robotica

Lezione 13

Page 2: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Lecture Outline

What is learning? What is adaptation? Why robot should learn? Why is learning in robots hard? Learning types and methods Reinforcement learning

credit assignment basic model examples

Page 3: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Adaptation v. Learning

Learning produces changes within an organism that, over time, enable it to perform more effectively within its environment.

Adaptation is learning through making adjustments in order to be more tuned with its environment. different time scales: acclimatization (slow) v.

homeostatis (rapid) different levels: genotypic v. phenotypic

Page 4: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Types of Adaptation (McFarland)

Behavioral - behaviors are adjusted relative to each other

Evolutionary - descendents are based on ancestor’s performance over long time scales

Sensory - sensors become more attuned to the environment

Learning as adaptation - anything else that results in a more ecologically fit agent

Page 5: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Adaptive Control

Aström 1950s

Page 6: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Importance of Learning

Learning is more than just adaptation Learning, the ability to improve one's

performance over time, is considered the main hallmark of intelligence, and the greatest challenge of AI.

Learning is particularly difficult to achieve in physical robots, for all the reasons that make intelligent behavior in the physical world difficult.

Page 7: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

What Learning Enables

Generalizing concepts Specializing concepts Reorganizing information Introducing new knowledge (facts,

behaviors, rules) into the system Creating or discovering new concepts Creating explanations Reusing past experiences

Page 8: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Why Learn in Robots?

What is the purpose of learning in robots?

1) providing the robot with the ability to adapt to changes in its task and/or environment

2) providing the robot with the ability to improve performance

3) providing the robot designer with a way of automating the design and/or programming of the robot

Page 9: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

What Can be Done? automatically design the robot's body automatically design a physical network for

its processor automatically generate its behaviors automatically store and re-use its previous

executed plans automatically improve the way its layers

interact automatically tune the parameters within the

behaviors

Page 10: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

What Has Been Done?

Parts of robot bodies, brains (i.e., processors), and programs have been automatically generated (i.e., learned).

Robots given initial programs have used experience and trial & error to improve the programs (from parameter tuning to switching entire behaviors)

Robots programmed for a task have adapted to changes in the environment (e.g., new obstacles, new maps, heavier loads, new goals)

Page 11: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Challenges in Robots

Situatedness in the world Noise, occlusion, dynamics, hard to model

Real time constraints Slow learners die easily in the real world

Simultaneous and multi-modal Multiple goals & tasks Need to walk and talk (not just either or)

Page 12: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Challenges in Learning

Saliency: what is relevant right now? Credit assignment: who is to receive

credit/blame for the outcome? New term: when should a new

concept/representation be created? Indexing: how to organize the memory? Utility: what should be forgotten?

Page 13: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Levels of Learning Within a behavior

Suitable responsesSuitable stimulusSuitable behavioral functional mappingMagnitude of response (gain)Whole new behavior

Within a behavior assemblagesuitable set of behaviorsrelative strengthssuitable coordination function

Page 14: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Learning Methods Reinforcement learning Neural network (connectionist) learning Evolutionary learning Learning from experience

memory-basedcase-based

Inductive learning Explanation-based learning Multistrategy learning

Page 15: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Types of Learning Numeric or symbolic

numeric: manipulated numeric functions symbolic: manipulate symbolic representations

Inductive or deductive inductive: generalize from examples deductive: optimize what is known

Continuous or batch continuous: during interaction w/ world batch: after interaction, all at once

Page 16: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Some Terminology

Reward/punishment Positive/negative “feedback”

Cost Function/Performance Metric Scalar (usually) goodness measure

Induction Generating a function (a hypothesis) that

approximates the observed examples

Teacher, critic Provides feedback

Page 17: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

More Terminology

Plant/Model System/Agent that we want to train

Convergence reaching a desired (or steady) state

Credit assignment problem who should get the credit/blame? hard to tell over time hard to tell in multi-robot systems

Page 18: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Reinforcement Learning

Reinforcement learning is currently the most popular approach to learning on mobile robots.

Reinforcement learning is inspired by conditioning in psychology

Law of effect (Thorndike 1911): Applying a reward immediately after the

occurrence of a response increases its probability or reoccurring...

Page 19: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Thorndike's Law of Effect

Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.

Page 20: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Reinforcement Learning

...while providing punishment after the response will decrease the probability.

Translated to robotics: Some combinations of stimuli (i.e., sensory

readings and/or state) and responses (i.e., actions/behaviors) are coupled with subsequent reward in order to increase their probability of future use.

Page 21: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Reinforcement Learning

Desirable responses or outcomes are positively reinforced (rewarded) and thus strengthened, while undesirable ones are negatively reinforced (punished) and thus weakened.

This very general notion can be translated into a variety of specific reinforcement learning algorithms.

Reinforcement learning is a set of learning problems, not algorithms!

Page 22: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Challenges in RL Learning from delayed rewards: the

problem is difficult if the feedback is not immediate

Credit assignment problem: when something good or bad happens, what exact state/condition-action/behavior should be rewarded or punished?

Common approach: use the expected value of exponentially weighted past/future reinforcement

Page 23: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

RL Algorithms RL algorithms prescribe exact mathematical

functions that associate states/situations, actions/behaviors, reinforcement, and various associated parameters.

Some of these algorithms (Q-learning, TD learning, etc.) have well-understood convergence properties; they are guaranteed to make the robot learn the optimal solution.

Page 24: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Optimality in Learning

Optimality depends on strong assumptions: the robot must be given infinitely many trials of each state/action combination, must know what state it is in and what action it has executed, and the world must not change too quickly

But: There is not enough time for infinite trials, outcomes of actions are uncertain, and the world can change.

=> Optimality is impractical.

Page 25: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Observability The world of a robot is partially observable;

the robot does not know exactly what state/situation it is in at all times

Learning algorithms have been developed for partially observable environments, and while they are more realistic, they require even more time to converge. In general, managing uncertainty is very hard in learning.

Page 26: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Unsupervised Learning

RL is a form of unsupervised learning RL allows a robot to learn on its own, using

its own experiences (and some built-in notions of desirable and undesirable situations, associated with reward and punishment)

The designer can also provide reinforcement (reward/punishment) directly, to influence the robot

The robot is never told what to do.

Page 27: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

RL Function and Critic

RL systems contain a reinforcement function, which determines when and how much positive or negative reinforcement to deliver to the robot

Note: properly scaled positive reinforcement is sufficient.

The critic is the part of the RL system which provides the reinforcement, I.e., executes the reinforcement function.

Page 28: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Types of Critic The critic can be

external: if the user provides the reinforcement internal: if the system itself provides the

reinforcement

In both cases the approach is unsupervised in that the answer is never given explicitly by the critic

Page 29: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

RL System Diagram

Page 30: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

What Can be RL Learned?

Policy: the function that maps the states/situations to actions/behaviors

Utility: the function that gives a value to each state

Both of the above are learned relative to a specific goal

If the goal changes, so must the policy and/or the utility function

Example: maze learning

Page 31: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Adaptive Heuristic Critic

Adaptive Heuristic Critic is an RL algorithm (Barto & Sutton)

The process of learning what action to take in what state (the policy) is separate from learning the value of each state (the utility function)

Both are based on trying different actions in different states and observing the outcomes over time

Page 32: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Q Learning Q learning is the most popular RL algorithm

(Watkins 1980’s) A single utility Q-function is learned in order

to evaluate both actions and states. Shown to be superior to AHC Q values are stored in a table, usually Updated at each step, using the following

update rule:

Page 33: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Q Learning Algorithm

Q(x,a) <- Q(x,a) + b (r + *E(y) - Q(x,a)) x is state, a is action b is learning rate r is reward is discount factor (0,1) E(y) is the utility of the state y, computed as

E(y) = max(Q(y,a)) for all actions a

Guaranteed to converge to optimal, given infinite trials

Page 34: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

RL Architectures

Page 35: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Example: Genghis’ Walking

Correlation using statistics Genghis

trailing wheel: + feedbackunderbelly contact sensors: - feedback

Learned stable tripod stance and tripod gait

Page 36: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Example: Obelix’s Pushing

Obelix used Q-learning (Connell 90) Subsumption (colony) architecture Learned to push as well as or better than

hand-tuned controller 8 ultrasonic sensors, 1 IR, 1 motor current 5 action choices only: ahead, left or right

(22) sharp left and right (45)

Page 37: Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning

Obelix