new ties wp2 agent and learning mechanisms. decision making and learning agents have a controller...

NEW TIES WP2 Agent and learning mechanisms

Decision making and learning

Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action

Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude

genes” (learned through evolution)

Example of a DQT

ABias Test Action Decision 0.2 Genetic bias YES Boolean choice

Legend

VISUAL:FRONTFOODREACHABLE

NO YES

TURNLEFT

MOVE TURNRIGHT

0.6 0.2 0.2

PICKUP

BAG:FOODT

YES NO

TURNLEFT

MOVE TURNRIGHT

0.6 0.2 0.2

Interaction evolution & individual learning

Bias node with n children each with bias bi

Bias ≠ probability Bias bi is learned, changing (name: learned

bias) Genetic bias gi is inherited, part of genome,

constant Actual probability of choosing child x:

p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked

through formula

DQT nodes & parameters cont’d

Test node language: native concepts + emerging concepts

Native: see_agent, see_mother, see_food, have_food, see_mate, …

New concepts can emerge by categorisation (discrimination game)

Learning: the heart of the emergence engine

Evolutionary learning: not within an agent (not during lifetime), over

generations by variation + selection

Individual learning: within one agent, during lifetime by reinforcement learning

Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces

Types of learning: properties Evolutionary learning:

Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation

Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by

rewards) Individually learnt knowledge dies with its host agent

Social learning: Agent imports knowledge already created elsewhere (new? not

new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces

can save effort for recipient can create novel combinations

Exporting knowledge helps its preservation after death of host

Present status of types of learning

Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with

population stability (im/explosion) Individual learning:

code, but never demonstrated in NT scenarios Social learning:

Under construction/design based on the “telepathy” approach

Communication protocols + adoption mechanisms needed

Evolution: variation operators

Operators for DQT: Crossover = subtree swap Mutation =

Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge

Operators for attitude genes: Crossover = full arithmetic xover Mutation =

Add Gaussian noise Replace with random value

Evolution: selection operators

Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK

Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy

Experiment: Simple world

Setup: Environment

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).

Both are variable in number. Initial distribution of agents (500): in

upper left corner Initial distribution of food (10000): 5000

in upper left and lower right corner.

Setup: Agents

Native knowledge (concepts and DQT sub trees)

Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)

Random DQT-tree branches Differs per agent Based on the “pool” of native concepts

Simulation continued for 3 months real time to test stability

Experiment: Poisonous Food

Setup: Environment

Two types of food: poisonous (decreases energy) and edible (increases energy)

World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both

are variable in number. Initial distribution of agents (500): uniform

random over the grid space. Initial distribution of food (10000): 5000 of

each type of food uniform random over the same grid space as the agents.

Setup: Agent

Native knowledge Identical to simple world experiment

Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present

No random DQT-tree branches

Measures

Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age

Experiment: Poisonous Food Results

timestep 1250 2500 3750 5000 6250 7500 8750 10000 11250 12500 13750 15000

population size

healthy plants (x10)

poisonous plants (x10)

average agent energy (x100)

new ties wp2 agent and learning mechanisms. decision making and learning agents have a controller...

learning evolutionary

dqt learning

learning agents

learning mechanisms

types of learning

reinforcement learning

mutation individual

new knowledge

Documents

data management expert panel - wp2. wp2 overview

wp2 final report

wp2: description of the collaborative environments … ·...

wp2 presentation

chorevolution wp2 enablers

topdrim: update wp2

wp2 progress overview

confidential wp2 review meeting milan, october 05, 2011...

europeana newspapers wp2 liber2013

wp2- na2: remote sensing of vertical aerosol distribution...

wp2: financial markets

presentation wp2

aspect wp2 kick off

exercise wp2

western pacific warm pool (wp2)...

111114 wp2 accentuate

jetspeed iad wp2(sr)

wp2 deliverables 22 - quropequrope.eu/system/files/wp2 -...

wp2 democracy burma

wp2 technologies final