new ties wp2 agent and learning mechanisms
DESCRIPTION
NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/1.jpg)
NEW TIES WP2 Agent and learning mechanisms
![Page 2: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/2.jpg)
Decision making and learning
Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action
Decision making = using DQT Learning = modifying DQT Decisions also depend on inheritable “attitude
genes” (learned through evolution)
![Page 3: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/3.jpg)
Example of a DQT
0.5
B
BT
ABias Test Action Decision 0.2 Genetic bias YES Boolean choice
Legend
VISUAL:FRONTFOODREACHABLE
T
NO YES
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
PICKUP
1.0
A
BAG:FOODT
YES NO
TURNLEFT
MOVE TURNRIGHT
A
0.6 0.2 0.2
EAT
1.0
A
0.5
![Page 4: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/4.jpg)
Interaction evolution & individual learning
Bias node with n children each with bias bi
Bias ≠ probability Bias bi is learned, changing (name: learned
bias) Genetic bias gi is inherited, part of genome,
constant Actual probability of choosing child x:
p(b,g) = b + (1 - b) ∙ g Learned and inherited behaviour are linked
through formula
![Page 5: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/5.jpg)
DQT nodes & parameters cont’d
Test node language: native concepts + emerging concepts
Native: see_agent, see_mother, see_food, have_food, see_mate, …
New concepts can emerge by categorisation (discrimination game)
![Page 6: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/6.jpg)
Learning: the heart of the emergence engine
Evolutionary learning: not within an agent (not during lifetime), over
generations by variation + selection
Individual learning: within one agent, during lifetime by reinforcement learning
Social learning: during lifetime, in interacting agents by sending/receiving + adopting knowledge pieces
![Page 7: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/7.jpg)
Types of learning: properties Evolutionary learning:
Agent does not create new knowledge during lifetime Basic DQTree + genetic biases are inheritable “knowledge creator” = crossover and mutation
Individual learning: Agent does create new knowledge during lifetime DQTree + learned biases are modified “knowledge creator” = reinforcement learning (driven by
rewards) Individually learnt knowledge dies with its host agent
Social learning: Agent imports knowledge already created elsewhere (new? not
new?) Adoption of imported knowledge ≈ crossover Importing knowledge pieces
can save effort for recipient can create novel combinations
Exporting knowledge helps its preservation after death of host
![Page 8: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/8.jpg)
Present status of types of learning
Evolutionary learning: Demonstrated in 2 NT scenarios Autonomous selection/reproduction causes problems with
population stability (im/explosion) Individual learning:
code, but never demonstrated in NT scenarios Social learning:
Under construction/design based on the “telepathy” approach
Communication protocols + adoption mechanisms needed
![Page 9: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/9.jpg)
Evolution: variation operators
Operators for DQT: Crossover = subtree swap Mutation =
Substitute subtree with random sub-tree Change concepts in test nodes Change bias on an edge
Operators for attitude genes: Crossover = full arithmetic xover Mutation =
Add Gaussian noise Replace with random value
![Page 10: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/10.jpg)
Evolution: selection operators
Mate selection: Mate action chosen by DQT Propose – accept proposal Adulthood OK
Survivor selection: Dead if too old ( ≥ 80 years) Dead if zero energy
![Page 11: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/11.jpg)
Experiment: Simple world
Setup: Environment
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc).
Both are variable in number. Initial distribution of agents (500): in
upper left corner Initial distribution of food (10000): 5000
in upper left and lower right corner.
![Page 12: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/12.jpg)
Experiment: Simple world
Setup: Agents
Native knowledge (concepts and DQT sub trees)
Navigating (random walk) Eating (identify, pickup and eat plants) Mating (identify mates, propose/agree)
Random DQT-tree branches Differs per agent Based on the “pool” of native concepts
![Page 13: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/13.jpg)
Experiment: Simple world
Simulation continued for 3 months real time to test stability
![Page 14: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/14.jpg)
Experiment: Poisonous Food
Setup: Environment
Two types of food: poisonous (decreases energy) and edible (increases energy)
World size: 200 x 200 grid cells Agents and food (no tokens, roads, etc). Both
are variable in number. Initial distribution of agents (500): uniform
random over the grid space. Initial distribution of food (10000): 5000 of
each type of food uniform random over the same grid space as the agents.
![Page 15: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/15.jpg)
Experiment: Poisonous Food
Setup: Agent
Native knowledge Identical to simple world experiment
Additional native knowledge Can distinguish poisonous from edible plants Relation with eating/picking up is not present
No random DQT-tree branches
![Page 16: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/16.jpg)
Experiment: Poisonous Food
Measures
Population size Welfare (energy) Number of poisonous and edible plants Complexity of controller (nr. of nodes) Age
![Page 17: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/17.jpg)
Experiment: Poisonous Food
Demo
![Page 18: NEW TIES WP2 Agent and learning mechanisms](https://reader035.vdocument.in/reader035/viewer/2022062500/56815449550346895dc25dda/html5/thumbnails/18.jpg)
Experiment: Poisonous Food Results
0
500
1000
1500
2000
2500
timestep 1250 2500 3750 5000 6250 7500 8750 10000 11250 12500 13750 15000
population size
healthy plants (x10)
poisonous plants (x10)
average agent energy (x100)