evolution of coordination and communication in groups of embodied agents

Evolution of Coordination and Communication in Groups of Embodied Agents

A PhD Thesis Presentation by Olaf Witkowski

!Department of Computer Science

University of Tokyo19 January 2015

• Biological cells, insect swarms, bird flocks all self-organize in groups displaying a collective behavior.

• Individuals interacting together produce adaptive behavior, i.e. behavior that increases their chances of survival and reproduction.

Introduction

2

Myxobacteria form wolf packs to share

digestive enzymes

Honey bees exchange information to optimize foraging

Weaver ants build bridges

with their own bodies

Bigeye fish form schools

to avoid predation

Research questions

• In which conditions does collective behavior emerge in a group of autonomous agents?

• Can individuals work together more effectively when they rely on a communication system?

3

Significance is twofold

• This thesis is relevant to both scientific and technological purposes.

• First, it contributes to shed light on the evolution of coordination and communication.

• Second, a better understanding of the fundamental principles of collective behavior may also lead to innovative methods in multi-agents systems, ubiquitous computing devices and swarm computation.

4

Outline

• Introduction & Background

• Methods

• Gene-culture coevolution (ch. 7)

• Synchronization vs. variability (ch. 6)

• 3D signal-swarming models (ch. 4)

• 3D spatial Prisoner’s Dilemma (ch. 5)

• Conclusion

}contributions

5

Methods

6

Methods

• 3-block model = Darwinian evolution + “Robots” + Environment

7

��

Darwinian evolution

Robots

Environment

�

Methods — Agent-Based Model (ABM)

• Agent-based modeling: computational models which simulate the actions and interactions of autonomous creatures in a simulated environment.

• The agent’s actions impact on its survival, just like in real environments.

8Example of ABM by Wischmann, Floreano & Keller (2012)

• Artificial Neural network (McCulloch & Pitts 1943, Rosenblatt 1958)

Methods — Artificial Neural Networks (ANN)

9

Neural network (“brain”)Neuron

Connection weights

:

• Connection weights encoded in a genotype & evolved by a genetic algorithm (Fisher 1958, Holland 1995).

Methods — Genetic Algorithm (GA)

10

Genotype = vectors of ANN connection weights = (w1, w2, … wn)

The fitness value of each genotype is determined by the agent’s performances on a predefined task.

w1w2w3 - - - wn

w1w2w3 - - - wn

w1w2w3 - - - wn

Population of genotypes

Evolution environment

GA operators

:

Methods — Evolutionary Robotics (ER)

• Evolutionary robotics = Genetic algorithm + Agent-based modeling

11

��

Darwinian evolution

Robots

Environment

��

��

��

��

!�

Cliff, Harvey & Husbands (1992) Floreano & Mondada (1994)

Methods - Asynchronous GA

• The generations of genotypes are overlapping: each agent’s fitness is evaluated every iteration.

• When the agent gets enough energy, it replicates: the offspring is added to the running simulation.

12

!�

Outline

• Introduction & Background

• Methods

• Gene-culture coevolution (ch. 7)

• Synchronization vs. variability (ch. 6)

• 3D signal-swarming models (ch. 4)

• 3D spatial Prisoner’s Dilemma (ch. 5)

• Conclusion

}contributions

13

Generic gene/culture coordination

Spatial coordination with communication

0D

3D

0-2D

Seasonal coordination through communication

Neutral selection in gene-culture coevolution

14

Goal: analyze the evolution of generic communication in a gene-culture model

Signal matching task

Spread of Indo-European languages through time

Bouckaert et al. (2012), Mapping the Origins and Expansion of the Indo-European Language Family, Science, vol. 337, no. 6097, pp. 957-960.

15

• Gene-culture models have been used to investigate language evolution, due to the lack of empirical data (Boyd & Richerson 1992, Christiansen & Kirby 2003).

• We use genetic algorithm, artificial neural networks, and different social networks for learning.

16

Signal matching task



17

SignalSignalSignal

• Agents produce signals

match match

• Agents need to match their signals with their neighbours

• Best performing agents are selected and replicated through genetic algorithm


• Culture: each agent learns by imitating its neighbors’ signals

18

Learner TeacherLearning phase

Social network

Learner NeighborEvaluation phase

• Gene: each agent is then evaluated for reproduction

• If the learned culture becomes uniform over the population, the selection pressure on the genes is relaxed, leading to a neutral selection space.

19


Social networks: Learning in lattice ; fitness in lattice ; reproduction in row

Genes: = weights before learning

Cultures: = weights after learning

TimeReproduction

network = rows

Communication network = lattice

• In this model the agents’ task was directly to coordinate their communication.

• The results show neutral selection, offers new insights with the analogy to Potts model/Oscillators theory/Swarming models.

Conclusion

20

• Next, we will go further by studying tasks that indirectly require to coordinate via communication.

Task

Synchronization in dynamic environments

21

Goal: study agent strategies for variable resource, using energy saving vs. synchronisation via communication

Resource variationSignal

Animal behavior in winter Source: National Geographic & BBC documentaries, 2014

22

Food hoarding

Bird migration

Hibernation

• Population of agents in an environment with seasonal food availability

• Each agent controlled by a simple neural network evolved by genetic algorithm


23Simple neural network (Elman 1990)


24

Dimensions 1D 2D 0D

Model

Ring world!!!!

Grid world!!!!

Action-based!!!!

Results Synced wake-up using signaling

Synced wake-up using signaling

Speciated resource saving

behaviors

FP -x : Food Patch x ; x { 0 ,..., P }

A - y : Agent y ; y { 0 ,..., N }

A - y ( sv ) : sv { 0 ,..., Patch Spacing } Agent y signal value

FP -0

A-0

FP -5

FP -1

FP -4

FP -2

FP -P

FP -6

FP -8

FP -7 FP -3

A-N

A-0 ( 0 ) A-0 ( 0 )

A-N ( sv )

A-N ( sv )

...

3 experimental setups


• Signaling agents showed better collective performances than non-signalling agents.

• The agents wake-up from hibernation based on other agents’ signals.

25

FP -x : Food Patch x ; x { 0 ,..., P }

A - y : Agent y ; y { 0 ,..., N }

A - y ( sv ) : sv { 0 ,..., Patch Spacing } Agent y signal value

FP -0

A-0

FP -5

FP -1

FP -4

FP -2

FP -P

FP -6

FP -8

FP -7 FP -3

A-N

A-0 ( 0 ) A-0 ( 0 )

A-N ( sv )

A-N ( sv )

...

Ring map Food

FoodAgent

AgentLattice map 2D

1D

Summer

Winter

Population vs size vs time: shows evolutionary stable strategy

26

• Without direct communication, agents develop specific strategies to survive winters.

• Strategies: fast reproduction, resource saving and hibernation.


Number of individuals

Agent’s size

Time step

Action-cost model: cycles detected

Small agents Mid-sized agents Large agents

• In dynamic environments, agents synchronize foraging with seasons using communication.

• Without direct communication, agents use specific strategies to save resource.

• Next, we will consider static resources in a minimalist system

Resource variationSignal

Conclusion

27

Olaf Witkowski, Geoff Nitschke and Takashi Ikegami. July 2012. When is happy hour: An agent’s concept of time. Proceedings of the Thirteenth International Conference on The Synthesis and Simulation of Living Systems, 13, 544–545.!

Olaf Witkowski and Geoff Nitschke. September 2013. The Transmission of Migratory Behaviors. Proceedings of the Twelveth European Conference on Artificial Life, 12, 1218–1220.!

Olaf Witkowski and Nathanaël Aubert. July 2013. Size Does Matter : The Impact of Size on Hoarding Behaviour. Proceedings of the Thirteenth International Conference on The Synthesis and Simulation of Living Systems, 13, 542–543.

Signal-based swarming

28

Goal: use minimalist 3D simulation to explore the emergence of swarming based on signaling

!�

29

Starling murmuration A Bird Ballet by Neels Castillon


• Reynolds’ basic flocking model (1986) consisted of three simple steering behaviors that determined how individual boids should manoeuver based on their velocity and position within the flock.

30

Separation Alignment Cohesion


• Gradual improvements of the model, adding rules or fixed leaders (Mataric 1992, Hartman & Benes 2006, Cucker & Huepe 2008, Su et al. 2009, Yu et al. 2010, Chiew et al. 2013)

• Swarming can be developed using an evolutionary robotics approach, often with complex sensors and pressures such as predators (Tu and Terzopoulos 1994, Ward et al. 2001, Olson et al. 2013)

31

Hartm

an &

Ben

es (2

006)


32

• In our 3D simulation, blind sound-emitting agents look for a hidden food resource. An asynchronous reproduction scheme is used to evolve the agents’ controllers.

• The models shows (a) emergence of collective motion from the combination of signaling system and foraging task, and (b) clustering improves the search.


• Each agent is equipped with 1 signaling device and 6 sensors.

• The sensors detect signals produced by other agents from 6 directions.

33

signalemitter

receiver

1

2

34

5

6

Simulation — Agent survival & reproduction

Energy cost = 0.01 + [ 0.0 ; 0.001 ]

34

> 10

Energy -> replication with mutation

= 02

No energy -> death

��

Energy gain = ________________Carrying capacityDistance to goal_______________

�� Survival

Model — Neural controller

35

M1 = pitch M2 = yaw S = produced signal S1..6 = sensed signal

Elman simple recurrent network architecture

(Elman 1990)

Results — Emergence of swarming

• Agents self-organize into swarms without any other external control than the fitness they get from being closer to the goal.

• The agents go through three phases: (1) random motion (2) dynamic changing clusters and (3) compact ball around resource

36

(1) (2) (3)(1) (2) (3)

0 2 4 6 8 10 12

x 105

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Time steps

Ave

rag

e n

um

be

r o

f n

eig

hb

ors

Average number of neighbors (10 runs) with signalling ON vs OFF

signalling ON

signalling OFF

Results — Neighborhood analysis

37

← signal on

← signal offAver

age

num

ber o

f neig

hbou

rs

Average number of neighbors (10 runs)

Time steps

Results — distance to goal areas (signal on/off)

signal on

signal off

0 2 4 6 8 10 12

x 105

0

50

100

150

200

250

300

350

400

450

500

Dis

tan

ce to g

oal

Average distance to goal every iteration (silent control simulation)

Simulation steps38

Dist

ance

to g

oal

Average distance to goal (signal on)

Time steps

0 2 4 6 8 10 12

x 105

0

50

100

150

200

250

300

350

400

450

500Average distance to goal every iteration (regular run)

Dis

tance

to g

oal

Simulation steps

Dist

ance

to g

oal

Average distance to goal (signal on)

Time steps

• The transfer entropy (Schreiber 2000) T from a random process X to another process Y is a measure of the amount of directed transfer of information from X to Y:

!

where H is the Shannon entropy (Shannon & Weaver 1949).

Results — Transfer entropy

39

Results — Measure of following behavior

40

← signal on

← signal off

The transfer entropy from a random process X to another process Y is a measure of the amount of directed transfer of information from X to Y, defined as:

Inw

ard

neig

hbou

rhoo

d tra

nsfe

r ent

ropy

Time steps

Inward neighbourhood transfer entropy

Results — Measure of individual leadership

41

The transfer entropy from a random process X to another process Y is a measure of the amount of directed transfer of information from X to Y, defined as:

Out

war

d ne

ighb

ourh

ood

trans

fer e

ntro

py

Time steps

Outward neighborhood transfer entropy

Phylogenetic tree & neutral selection

4242

Principal Component Analysis (color = iteration, radius = swarming)

PC 2

PC 1

Simulation tim

e

Biplot of a PCA on genotypes of all agents in a typical run, over one million iterations. Each circle represents one agent’s genotype, the diameter representing the average number of

neighbors around the agent over its lifetime, and the color showing its time of death.

!�

• In this chapter, we used a minimalist model to demonstrate the emergence of swarming behavior.

• The agents exchange signals in order to swarm together, which in turn improves their foraging.

Conclusion

43

Olaf Witkowski and Takashi Ikegami. Expected mid-2015 (In preparation). Signal-based swarming and neutral selection. Submitted to PLoS Computational Biology. <Paper>!

Olaf Witkowski, Geoff Nitschke and Takashi Ikegami. January 2015 (In press). Signal drives genetic diversity: an agent-based approach to speciation. Proceedings of the Twentieth International Symposium on Artificial Life and Robotics, 20. <Paper, Talk>!

Olaf Witkowski and Takashi Ikegami. July 2014. Asynchronous evolution: Emergence of signal- based swarming. Proceedings of the Fourteenth International Conference on the Simulation and Synthesis of Living Systems, 14, 302–309. <Paper, Talk>

!�

• Next, we will explore the same model with a different task.

PD

Swarming in dynamic 3D Prisoner’s Dilemma

44

Goal: find impact of cooperation/defection game on agents’ collective behavior

!�

PD

Food sharing in vampire bats

Attenborough, D. (2011). Friends and Rivals. BBC documentary.

45

Iterated Prisoner’s Dilemma (IPD)

• Prisoner’s Dilemma (Flood & Dresher 1950) Each player can Cooperate (C) or Defect (D)

• Iterated version (Axelrod 1984)

• Spatial version (Nowak & May 1993)

• Our version: dynamic & spatial

46

Spatial prisoner’s Dilemma

PD Reward matrix

Dynamic Spatial IPD

47

• Agent moves on 3D map

• Agent controls direction (constant speed)

• Communication through signals (2 channels) to detect “friendly neighbors”

• Agent chooses to cooperate/defect

Cooperation (blue) or Defection (red)

Simulation visualization

Differences with previous model

Task: play Prisoner’s Dilemma Reproduction: offspring added locally

Task: distance to resource Reproduction: offspring added globally

48

Ch. 4 Ch. 5

Agent’s Controller

49

Movement CommunicationCooperate or Defect

Sensors

Hidden Units

Context Units

I12

Elman (1990)

Previous controller

• We extend the reward per iteration from Chiong & Kirley (2012) to take into account spatial continuity:

Coop. vs Def. Costs & Payoff Matrix

50

Figure 2: Architecture of the agents controller, composedof 12 input neurons, 10 hidden neurons, 10 context neuronsand 5 output neurons.

spacial continuity. It is defined by:8>>>>>>>>><

>>>>>>>>>:

C : bX

coop2radius

1

1 + distance(coop,me)

�c

X

any2radius

1

1 + distance(any,me)

D : bX

coop2radius

1


(1)

With b the bonus, c the cooperation cost, b > c > 0,and distance the Euclidian distance between two agents. Ra-dius represent the sphere of radius radius around the agent.Note that the agent itself is not considered part of its neigh-borhood. The distance is not part of the original fitness,which made sense since Chiong and Kirley (2012) are bas-ing their simulation on a lattice, where the distance is alwaysthe same. Our version integrates nicely the fact that interac-tions with distant agents should be much weaker than withcloser ones.

Another advantage of this fitness is that defection can alsobe assimilated to not playing (no cost). Note that there isalso no cost and no reward for cooperating when alone.

We can see that this fitness is equivalent to the traditionalPD game, since, for two agents A and B at a distance d ofeach other, (1) yields the payoff matrix:

C D

C

(b� c)

1 + d

� c

1 + d

D

b

1 + d

0

It is clear that for the conditions b > c > 0, this matrixcorrespond to a PD.

Based on the outcome of the match, agents can choose anew direction, which is similar to leaving the group in the

Initial energy 2Maximum age 5000Maximum energy 20Maximum population size 500Population threshold 100Reproduction threshold 10Reproduction cost 2Reproduction radius 2Survival cost per turn 2Mutation rate (per gene) 0.05

Table 1: Parameters used for the simulation.

walk away strategy (Aktipis 2004), the main difference be-ing that, in our case, it is also possible for groups to split. Itis also similar in another aspect: there is a cost to leaving agroup, as a lone agent may need time to meet others.

Evolution/Parameters

Evolution is done continuously. Agents with negative orzero energy are removed, while agents with energy abovea threshold are forced to reproduce, within the limits of oneinfant per time step. The reproduction cost is low enough,considering the threshold, to not put the life of the agent atrisk. Table 1 indicates the various parameters used for evo-lution.

Results

Results were obtained on a set of 10 runs, with additionalsets used for control. In our setting, all agents have a con-stant speed, but can choose in which direction they are head-ing. This allows for pseudo-static behaviors by looping incircles.

While some characteristics, such as agents’ movement,were strongly run dependent, the overall dynamics of thesystem was not. At the beginning of the run, the envi-ronment is seeded with random agents. Since all weightsin their neural network are set at random, roughly half ofthe agents initially choose to cooperate while the other halfchoose to defect. This leads to a fast extinction of coopera-tors (Figure 3, until approximately 50000 time steps), until agroup emerges strong enough to survive. The second phasefollows, in which cooperators are quickly increasing in num-ber due to the autocatalytic nature of this strategy (Figure 3).A third step happens eventually, where defectors invade thecluster, followed either by the survival of the cluster due tocooperators running away or a reboot of the cycle. In caseof survival, oscillations in the proportion of cooperators canbe observed. However, this phenomenon is averaged awayover multiple runs, since period and phase of the oscillationsare not correlated from one experiment to the other.

As a control, we ran the simulation after removing thepossibility for agents to move. In this case, cooperators



>>>>>>>>>:

C : bX

coop2radius

1


�c

X

any2radius

1


D : bX

coop2radius

1


(1)




C D

C

(b� c)

1 + d

� c

1 + d

D

b

1 + d

0








Results






>>>>>>>>>:

C : bX

coop2radius

1


�c

X

any2radius

1


D : bX

coop2radius

1


(1)




C D

C

(b� c)

1 + d

� c

1 + d

D

b

1 + d

0








Results




!(a) seek and destroy (b) cluster with high mobility / high reproduction rate

Simulation

51



Observed behaviors:

!!

(b)

Simulation - Cooperators increase

52

Cooperation proportion

Pro

porti

on o

f coo

pera

tors

in

the

popu

latio

n

Time steps



Simulation - Cooperators’ invasion

53



Simulation - Cooperators’ stronger signal

54

Signaling strength

Pro

porti

on o

f coo

pera

tors

in

the

popu

latio

n

Time steps



Simulation - Cooperators are moving faster

55

Average displacement of agents over a 100 steps sliding window

Pro

porti

on o

f coo

pera

tors

in

the

popu

latio

n

Time steps

Conclusion

• In this chapter, we gained the insight that cooperation requires grouping of collaborating agents.

• This grouping emerges as a swarming behavior degenerated from the previous chapter, using the communication channel to find other cooperators.

56

• Rappel goals: add schema: conclusions summary

Olaf Witkowski and Nathanaël Aubert-Kato. July 2014. Pseudo-static cooperators: Moving isn’t always about going somewhere. Proceedings of the Fourteenth International Conference on the Simulation and Synthesis of Living Systems, 14, 392–397. <Paper, Talk>

!� PD

Conclusion

57

Conclusion

3D signal-swarming models (ch. 4)

3D spatial Prisoner’s Dilemma (ch. 5)

Synchronization vs variability (ch. 6)

Gene-culture coevolution (ch. 7)

Summary of the specific focus of every chapter

PD

58

• In this thesis, using evolutionary robotics, we demonstrated how groups of agents can evolve efficient collective behavior based on communication.

• The way groups of animals come to cooperate by exchanging information is essential to optimize their behavior in an environment.

• Future swarm computation will need to build robots that are not directly controlled by human rules, but interact with each other to solve problems.

Conclusion

59

I am so thankful to…

Takashi Ikegami !Nathanaël Aubert-Kato, Geoff Nitschke, Julien Hubert, Luke McCrohon !

Everyone in Ikegami Lab !Jun’ichi Tsujii, Reiji Suda, Masami Hagiya, all the committee members !

My loving family & truly awesome friends !

Thank you

Publications and conferences

Olaf Witkowski and Takashi Ikegami. Expected mid-2015 (In preparation). Signal-based swarming and neutral selection. PLoS Computational Biology. <Paper>!

Olaf Witkowski, Geoff Nitschke and Takashi Ikegami. January 2015 (In press). Signal drives genetic diversity: an agent-based approach to speciation. Proceedings of the Twentieth International Symposium on Artificial Life and Robotics, 20. <Paper, Talk>!

Olaf Witkowski and Nathanaël Aubert-Kato. July 2014. Pseudo-static cooperators: Moving isn’t always about going somewhere. Proceedings of the Fourteenth International Conference on the Simulation and Synthesis of Living Systems, 14, 392–397. <Paper, Talk>!

Olaf Witkowski and Takashi Ikegami. July 2014. Asynchronous evolution: Emergence of signal- based swarming. Proceedings of the Fourteenth International Conference on the Simulation and Synthesis of Living Systems, 14, 302–309. <Paper, Talk>!

Olaf Witkowski and Geoff Nitschke. September 2013. The Transmission of Migratory Behaviors. Proceedings of the Twelveth European Conference on Artificial Life, 12, 1218–1220. <Paper, Talk>!

Olaf Witkowski and Nathanaël Aubert. July 2013. Size Does Matter : The Impact of Size on Hoarding Behaviour. Proceedings of the Thirteenth International Conference on The Synthesis and Simulation of Living Systems, 13, 542–543. <Extended Abstract, Talk>!

Olaf Witkowski, Geoff Nitschke and Takashi Ikegami. July 2012. When is happy hour: An agent’s concept of time. Proceedings of the

Thirteenth International Conference on The Synthesis and Simulation of Living Systems, 13, 544–545. <Extended Abstract, Poster>!

Olaf Witkowski and Nathanaël Aubert. May 2012. Size Does Matter : The Impact of Size on Hoarding Behaviour. Bio UT International Life Sciences Symposium. <Abstract, Poster>!

Olaf Witkowski, Geoff Nitschke and Takashi Ikegami. March 2012. Time To Migrate: The Effect of Lifespan on Imitation and Culturally Learned Migration. Seventh International Workshop on Natural Computing. <Abstract, Talk>!

Luke McCrohon and Olaf Witkowski. August 2011. Devil in the details: Analysis of a coevolutionary model of language evolution via relaxation of selection. Advances in Artificial Life, ECAL 2011. Proceedings of the Eleventh European Conference on the Synthesis and Simulation of Living Systems, 522–529. <Paper, Talk>!

Olaf Witkowski. September 2011. A Two-Speed Language Evolution: Exploring the Linguistic Carrying Capacity. Proceedings of Ways to Protolanguage 2 (Protolang 2011). <Paper, Talk>!

Olaf Witkowski. July 2011. Can Cultural Adaptation Lead to Evolutionary Suicide? At HBES 2011 (23rd Annual Human Behavior & Evolution Society Conference). <Abstract, Poster>!

Olaf Witkowski. August 2010. A Two-Speed Language Evolution. At Freelinguistics 2010 (4th Annual International Free Linguistics Conference). <Abstract, Talk>!

(In reverse chronological order)

evolution of coordination and communication in groups of embodied agents

Science

innovative methods

evolution of coordination

agents fitness

methods asynchronous

geneculture coevolution14goal

geneculture modelsi

agents performances

adaptive behavior