new ties wp2 agent and learning mechanisms. decision making and learning agents have a controller...
Post on 20-Dec-2015
213 Views
Preview:
TRANSCRIPT
Decision making and learning
Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action
Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude
genes” (learned through evolution)
Example of a DQT
0.5
B
BT
ABias Test Action Decision 0.2 Genetic bias YES Boolean choice
Legend
VISUAL:FRONTFOODREACHABLE
T
NO YES
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
PICKUP
1.0
A
BAG:FOODT
YES NO
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
EAT
1.0
A
0.5
Interaction evolution & individual learning
Bias node with n children each with bias bi
Bias ≠ probability Bias bi is learned, changing (name: learned
bias) Genetic bias gi is inherited, part of genome,
constant Actual probability of choosing child x:
p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked
through formula
DQT nodes & parameters cont’d
Test node language: native concepts + emerging concepts
Native: see_agent, see_mother, see_food, have_food, see_mate, …
New concepts can emerge by categorisation (discrimination game)
Learning: the heart of the emergence engine
Evolutionary learning: not within an agent (not during lifetime), over
generations by variation + selection
Individual learning: within one agent, during lifetime by reinforcement learning
Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces
Types of learning: properties Evolutionary learning:
Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation
Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by
rewards) Individually learnt knowledge dies with its host agent
Social learning: Agent imports knowledge already created elsewhere (new? not
new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces
can save effort for recipient can create novel combinations
Exporting knowledge helps its preservation after death of host
Present status of types of learning
Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with
population stability (im/explosion) Individual learning:
code, but never demonstrated in NT scenarios Social learning:
Under construction/design based on the “telepathy” approach
Communication protocols + adoption mechanisms needed
Evolution: variation operators
Operators for DQT: Crossover = subtree swap Mutation =
Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge
Operators for attitude genes: Crossover = full arithmetic xover Mutation =
Add Gaussian noise Replace with random value
Evolution: selection operators
Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK
Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy
Experiment: Simple world
Setup: Environment
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).
Both are variable in number. Initial distribution of agents (500): in
upper left corner Initial distribution of food (10000): 5000
in upper left and lower right corner.
Experiment: Simple world
Setup: Agents
Native knowledge (concepts and DQT sub trees)
Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)
Random DQT-tree branches Differs per agent Based on the “pool” of native concepts
Experiment: Poisonous Food
Setup: Environment
Two types of food: poisonous (decreases energy) and edible (increases energy)
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both
are variable in number. Initial distribution of agents (500): uniform
random over the grid space. Initial distribution of food (10000): 5000 of
each type of food uniform random over the same grid space as the agents.
Experiment: Poisonous Food
Setup: Agent
Native knowledge Identical to simple world experiment
Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present
No random DQT-tree branches
Experiment: Poisonous Food
Measures
Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age
top related