new ties wp2 agent and learning mechanisms
Post on 26-Jan-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
NEW TIES WP2 Agent and learning mechanisms
Decision making and learning
Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action
Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude
genes” (learned through evolution)
Example of a DQT
0.5
B
BT
ABias Test Action Decision 0.2 Genetic bias YES Boolean choice
Legend
VISUAL:FRONTFOODREACHABLE
T
NO YES
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
PICKUP
1.0
A
BAG:FOODT
YES NO
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
EAT
1.0
A
0.5
Interaction evolution & individual learning
Bias node with n children each with bias bi
Bias ≠ probability Bias bi is learned, changing (name: learned
bias) Genetic bias gi is inherited, part of genome,
constant Actual probability of choosing child x:
p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked
through formula
DQT nodes & parameters cont’d
Test node language: native concepts + emerging concepts
Native: see_agent, see_mother, see_food, have_food, see_mate, …
New concepts can emerge by categorisation (discrimination game)
Learning: the heart of the emergence engine
Evolutionary learning: not within an agent (not during lifetime), over
generations by variation + selection
Individual learning: within one agent, during lifetime by reinforcement learning
Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces
Types of learning: properties Evolutionary learning:
Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation
Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by
rewards) Individually learnt knowledge dies with its host agent
Social learning: Agent imports knowledge already created elsewhere (new? not
new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces
can save effort for recipient can create novel combinations
Exporting knowledge helps its preservation after death of host
Present status of types of learning
Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with
population stability (im/explosion) Individual learning:
code, but never demonstrated in NT scenarios Social learning:
Under construction/design based on the “telepathy” approach
Communication protocols + adoption mechanisms needed
Evolution: variation operators
Operators for DQT: Crossover = subtree swap Mutation =
Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge
Operators for attitude genes: Crossover = full arithmetic xover Mutation =
Add Gaussian noise Replace with random value
Evolution: selection operators
Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK
Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy
Experiment: Simple world
Setup: Environment
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).
Both are variable in number. Initial distribution of agents (500): in
upper left corner Initial distribution of food (10000): 5000
in upper left and lower right corner.
Experiment: Simple world
Setup: Agents
Native knowledge (concepts and DQT sub trees)
Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)
Random DQT-tree branches Differs per agent Based on the “pool” of native concepts
Experiment: Simple world
Simulation continued for 3 months real time to test stability
Experiment: Poisonous Food
Setup: Environment
Two types of food: poisonous (decreases energy) and edible (increases energy)
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both
are variable in number. Initial distribution of agents (500): uniform
random over the grid space. Initial distribution of food (10000): 5000 of
each type of food uniform random over the same grid space as the agents.
Experiment: Poisonous Food
Setup: Agent
Native knowledge Identical to simple world experiment
Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present
No random DQT-tree branches
Experiment: Poisonous Food
Measures
Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age
Experiment: Poisonous Food
Demo
Experiment: Poisonous Food Results
0
500
1000
1500
2000
2500
timestep 1250 2500 3750 5000 6250 7500 8750 10000 11250 12500 13750 15000
population size
healthy plants (x10)
poisonous plants (x10)
average agent energy (x100)
top related