ee141 motivated learning based on goal creation janusz starzyk school of electrical engineering and...

34
EE141 Motivated Learning based Motivated Learning based on Goal Creation on Goal Creation Janusz Starzyk School of Electrical Engineering and Computer Science, Ohio University, USA www.ent.ohiou.edu/~starzyk Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.

Upload: silvester-mosley

Post on 03-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

EE141

Motivated Learning based on Motivated Learning based on Goal CreationGoal CreationJanusz StarzykSchool of Electrical Engineering and Computer Science, Ohio University, USA

www.ent.ohiou.edu/~starzyk

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.

EE141

Embodied Intelligence (EI) Embodiment of Mind How to Motivate a Machine Goal Creation Hierarchy GCS Experiment Motivated Learning

OutlineOutline

EE141

Design principles of intelligent systemsDesign principles of intelligent systemsfrom Rolf Pfeifer “Understanding of Intelligence”, 1999

Interaction with complex environment

cheap design ecological balance redundancy principle parallel, loosely

coupled processes asynchronous sensory-motor

coordination value principle Agent

Drawing by Ciarán O’Leary- Dublin Institute of Technology

EE141

Embodied Intelligence Embodied Intelligence

– Mechanism: biological, mechanical or virtual agent

with embodied sensors and actuators– EI acts on environment and perceives its actions– Environment hostility is persistent and stimulates EI to act– Hostility: direct aggression, pain, scarce resources, etc– EI learns so it must have associative self-organizing memory– Knowledge is acquired by EI

Definition

Embodied Intelligence (EI) is a mechanism that learns how to survive in a hostile environment

EE141

Embodiment

Actuators

Sensors

Intelligence core

channel

channel

Embodiment

Sensors

Intelligence core

Environment

channel

channelActuators

Embodiment

Actuators

Sensors

Intelligence core

channel

channel

Embodiment

Sensors

Intelligence core

Environment

channel

channelActuators

Embodiment of a MindEmbodiment of a Mind Embodiment is a part of environment under control

of the mind It contains intelligence core and sensory motor

interfaces to interact with environment It is necessary for development of intelligence It is not necessarily constant

EE141

Changes in embodiment modify brain’s self-determination

Brain learns its own body’s dynamics

Self-awareness is a result of identification with own embodiment

Embodiment can be extended by using tools and machines

Successful operation is a function of correct perception of environment and own embodiment

Embodiment of MindEmbodiment of Mind

EE141

How to Motivate a Machine ?How to Motivate a Machine ?

A fundamental question is what motivates an agent to do anything, and in particular, to enhance its own complexity?

What drives an agent to explore the environment and learn ways to effectively interact with it?

EE141

How to Motivate a Machine ?How to Motivate a Machine ? Pfeifer claims that an agent’s motivation should emerge

from the developmental process. He called this the “motivated complexity” principle. Chicken and egg problem? An agent must have a motivation to

develop while his motivation comes from development?

Steels suggested equipping an agent with self-motivation. “Flow” experienced when people perform their expert activity well

would motivate to accomplish even more complex tasks. But what is the mechanism of “flow”?

Oudeyer proposed an intrinsic motivation system. Motivation comes from a desire to minimize the prediction error. Similar to “artificial curiosity” presented by Schmidhuber.

EE141

How to Motivate a Machine ?How to Motivate a Machine ?

Exploration is needed in order to learn and to model the

environment. But is exploration the only motivation we need to develop EI? Can we find a more efficient mechanism for learning?

I suggest a simpler mechanism to motivate a machine.

Although artificial curiosity helps to explore the environment, it leads to learning without a specific purpose. It may be compared to exploration in

reinforcement learning.

EE141

How to Motivate a Machine ?How to Motivate a Machine ? I suggest that it is the hostility of the environment, in the

definition of EI that is the most effective motivational factor. It is the pain we receive that moves us. It is our intelligence determined to reduce this pain that motivates us

to act, learn, and develop.

Both are needed - hostility of the environment and

intelligence that learns how to reduce the pain. Thus pain is good. Without pain we would not be motivated to develop.

Fig. englishteachermexico.wordpress.com/

EE141

Motivated Learning Motivated Learning I suggest a goal-driven mechanism to motivate

a machine to act, learn, and develop. A simple pain based goal creation system. It uses externally defined pain signals that are

associated with primitive pains. Machine is rewarded for minimizing the primitive

pain signals.

Definition: Motivated learning (ML) is learning based on the self-organizing system of goal creation in embodied agent. Machine creates abstract goals based on the primitive pain signals. It receives internal rewards for satisfying its goals (both primitive and

abstract). ML applies to EI working in a hostile environment.

EE141

Pain-center and Goal CreationPain-center and Goal Creation

+

-

Sensor

Motor

Pain detection

Dualpain

memory

Pain increase

Paindecrease

(-)

(+)

Stimulation

(-)

(+)

activation

need

Pain detection/goal creation centerReinforcement neuro-transmitterSensory neuronMotor neuron

Pain detection/goal creation centerReinforcement neuro-transmitterSensory neuronMotor neuron

Missing objects

inhibition

expe

ctat

ion

Simple Mechanism Creates hierarchy of values Leads to formulation of complex goals Reinforcement

• Pain increase• Pain decrease

Forces exploration

EE141

Abstract Goal Creation for MLAbstract Goal Creation for ML

The goal is to reduce the primitive pain level Abstract goals are created if they satisfy the primitive goals

Expectation

AssociationInhibitionReinforcementConnectionPlanning

- +

PainDual pain

Food

refrigerator

- +

Stomach

Abstract pain(Delayed memory of pain)

“food” becomes a sensory input to

abstract pain center

Sensory pathway(perception, sense)

Motor pathway(action, reaction)

Primitive Level

Level I

Level II

Eat

Open

EE141

Goal Creation ExperimentGoal Creation Experiment

Sensory-motor pairs and their effect on the environment

PAIR #SENSORY MOTOR INCREASES DECREASES

1 Food Eat sugar level food supplies

8 Grocery Buy food supplies money at hand

15 Bank Withdraw money at hand spending limits

22 Office Work spending limits

job opportunities

29 School Study job opportunities

-

EE141

Goal Creation Experiment in MLGoal Creation Experiment in ML

Pain signals in GCS simulation

0 100 200 300 400 500 6000

1

Primitive Hunger

Pa

in

0 100 200 300 400 500 6000

0.5

Lack of Food

Pa

in

0 100 200 300 400 500 6000

0.5

Empty Gorcery

Pa

in

Discrete time

EE141

Goal Creation Experiment in MLGoal Creation Experiment in ML

Action scatters in 5 GCS simulations

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40Goal Scatter Plot

Go

al I

D

Discrete time

EE141

Goal Creation Experiment in MLGoal Creation Experiment in ML

The average pain signals in 100 GCS simulations

0 100 200 300 400 500 6000

0.5

Primitive Hunger

Pai

n

0 100 200 300 400 500 6000

0.10.2

Lack of FoodP

ain

0 100 200 300 400 500 6000

0.10.2

Empty Gorcery

Pai

n

0 100 200 300 400 500 6000

0.10.2

Lack of Money

Pai

n

0 100 200 300 400 500 6000

0.050.1

Lack of JobOpportunitites

Pai

n

Discrete time

EE141

Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)

Mean primitive pain Pp value as a function of the number of iterations:

- green line for TDF - blue line for GCS.

Primitive pain ratio with pain threshold 0.1

EE141

Comparison of execution time on log-log scale TD-Falcon green GCS blue

Combined efficiency of GCS 1000 better than TDF

Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)

Problem solved

Conclusion: embodied intelligence, with motivated learning based on goal creation is an effective learning and decision making system for dynamic environments.

EE141

Reinforcement LearningReinforcement Learning Motivated Learning Motivated Learning Single value function Measurable rewards

Can be optimized

Predictable Objectives set by

designer Maximizes the reward

Potentially unstable

Learning effort increases with complexity

Always active

Multiple value functions One for each goal

Internal rewards Cannot be optimized

Unpredictable Sets its own objectives Solves minimax problem

Always stable

Learns better in complex environment than RL

Acts when needed

EE141

Sounds like science fictionSounds like science fiction

If you’re trying to look far ahead, and what you see seems like science fiction, it might be wrong.

But if it doesn’t seem like science fiction, it’s definitely wrong.

From presentation by Feresight Institute

EE141

Questions?Questions?

EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006

Resources – Evolution of ElectronicsResources – Evolution of Electronics

EE141 By Gordon E. MooreBy Gordon E. Moore

EE141

EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006

Clock Speed (doubles every 2.7 years)

EE141

Doubling (or Halving) timesDoubling (or Halving) times

Dynamic RAM Memory “Half Pitch” Feature Size 5.4

years Dynamic RAM Memory (bits per dollar) 1.5

years Average Transistor Price 1.6 years Microprocessor Cost per Transistor Cycle 1.1

years Total Bits Shipped 1.1

years Processor Performance in MIPS 1.8

years Transistors in Intel Microprocessors 2.0 years Microprocessor Clock Speed 2.7

yearsFrom Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006

EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006

EE141From Hans Moravec, Robot, 1999

EE141

Software or hardware?Software or hardware?

Sequential Error prone Require programming Low cost Well developed

programming methods

Concurrent Robust Require design Significant cost Hardware prototypes

hard to build

Software Hardware

EE141

2005 2010 2015 2020 2025 2030 2035 204010

4

105

106

107

108

109

1010

1011

Year

Num

ber

of n

euro

ns

Software Simulation (PC based) Hardware approach (FPGA)

Analog VLSI

Future software/hardware capabilitiesFuture software/hardware capabilities

Human brain complexity

EE141

Why should we care?Why should we care?

Source: SEMATECHSource: SEMATECH

EE141

0%

20%

40%

60%

80%

100%

1999

2002

2005

2008

2011

2014

% Area Memory

% Area ReusedLogic

% Area New Logic

Percent of die area that must be occupied by memory to maintain SOC design productivity

Design Productivity Gap Design Productivity Gap Low-Value Designs? Low-Value Designs?

Source = Japanese system-LSI industry

EE141

Self-Organizing Learning Arrays SOLARSelf-Organizing Learning Arrays SOLAR

Integrated circuits connect transistors into a system-millions of transistors easily assembled-first 50 years of microelectronic revolution

Self-organizing arrays connect processors into a system-millions of processors easily assembled-next 50 years of microelectronic revolution

* Self-organization * Sparse and local

interconnections * Dynamically

reconfigurable * Online data-driven

learning