towards boolean network design for robotic...
TRANSCRIPT
ALMA MATER STUDIORUM - UNIVERSITA DI BOLOGNA
SECONDA FACOLTA DI INGEGNERIA
Corso di Laurea Specialistica in
Ingegneria Informatica
TOWARDS
BOOLEAN NETWORK DESIGN
FOR ROBOTICS APPLICATIONS
Tesi di Laurea elaborata nel corso di:
Intelligenza Artificiale L-S
Tesi di Laurea di:
Mattia Manfroni
Relatore:
Prof. Andrea Roli
Correlatori:
Prof. Marco Dorigo
Dott. Ing. Mauro Birattari
Ing. Carlo Pinciroli
Anno Accademico 2009/2010
Sessione I
KEY WORDS
Boolean Networks
metaheuristics
robotics
Acknowledgements
First of all, I would like to thank Andrea Roli, who has offered me the
possibility to undertake this work and for being always very kind and helpful.
I would like to thank also Marco Dorigo for giving me the possibility to
work in IRIDIA and Mauro Birattari for his advices and support.
Thanks also to Carlo Pinciroli, who has always supported me and, more
important, has revealed a very good friend and a “vecchio cuore rossonero”
like me.
I thank also my IRIDIA family: Alessandro, Alexander, Ali, Antal, Arne,
Eliseo, Franco, Gianpiero, Giovanni, Manu, Marco, Nadir, Nithin, Prasanna,
Rachael and Sara. Thank you for the great time we spent together.
I would like to thank also all my university mates, with a special mention
for Bat and Frison, who have cooperated with me in several projects and for
being always very helpful.
I thank also all my friends, especially Bakken, Bat (again), Giulia, Monta
and my team-mates, for the good time we spent together in the last years.
Thanks also to Ettore, for donating me a smile and a big hug every day.
A huge thank to you, Alice, for your patience, care and love. Without
you this work would not be possible.
Eventually, I would like to thank my parents for giving me the possibility
to achieve my studies, for their continuous support in all my decisions and
for giving me a warm place I can always return to.
vi Acknowledgements
Contents
Acknowledgements v
1 Introduction 1
1.1 Outline of the work . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background: robotic agent programs 5
2.1 Rational agents . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Evolutionary Robotics . . . . . . . . . . . . . . . . . . . . . . 8
3 Boolean Networks 11
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 A formal definition . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Random Boolean Networks . . . . . . . . . . . . . . . . . . . . 16
3.5 Boolean Network design: state of the art . . . . . . . . . . . . 19
4 Methodology 21
4.1 Background concepts . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Introduction to metaheuristics . . . . . . . . . . . . . . . . . . 24
4.3 Methodology description . . . . . . . . . . . . . . . . . . . . . 26
5 Preliminary studies 29
5.1 Experiments and results . . . . . . . . . . . . . . . . . . . . . 29
5.2 Result analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 38
viii CONTENTS
6 Robotics applications 41
6.1 Path follower . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Phototaxis and antiphototaxis . . . . . . . . . . . . . . . . . . 48
7 Conclusions 65
A The e-puck robot 67
Bibliography 74
List of figures 76
List of tables 77
1. Introduction
Since thousands of years human beings have tried to understand how the
human brain can perceive and predict the environment and select the ac-
tions to manipulate it. One of the goals of Artificial Intelligence is not only
to understand such features, but also to build intelligent entities (e.g., an
intelligent robot).
In order to achieve this objective, several design methodologies have been
proposed in the last years. The first studied approach consists in design in-
telligent entities by hand, whereas most recent studies propose automatic
design techniques. The latter approach is very interesting, for example be-
cause an automatic design methodology may return solutions not evident a
priori to the designer. The main components of this kind of techniques are
an abstract model of the entity behaviour and a certain optimisation algo-
rithm. The behaviour model is characterized by some parameters, whose
optimal values are initially unknown. Thus, the optimisation algorithm is
used to find the optimal set of parameters for such model.
We identify Boolean Networks (BNs) as a suitable behaviour model: they
have been introduced by Stuart Kauffman as a model for genetic regulatory
networks and as an abstraction of complex systems in order to study the
mechanisms of evolutionary processes in living beings. The choice of BNs
as behaviour models is due to their capability to show complex dynamics
notwithstanding the compactness and simplicity of their definition. More-
over, the availability of tools for their analysis enables the designer to study
2 Introduction
the result of the automatic design and makes it possible to reuse it as a
building block for future applications, even more complex.
A suitable optimisation algorithm for BN design may be represented by
metaheuristic techniques, that are general search strategies upon which a
specific algorithm can be designed. Metaheuristics are particularly appropri-
ated to tackle BNs design because they are usually able to find (near-)optimal
solutions in huge search spaces in a limited amount of time.
In this thesis, developed in collaboration with the Institut de Recherches
Interdisciplinaires et de Developpements en Intelligence Artificielle (IRIDIA)
of the Universite Libre de Bruxelles, we propose an automatic methodology
to build intelligent robotic entities, based on Boolean Networks design by
metaheuristic techniques.
The goal of this work is to show a proof of concept: at first, the method-
ology is validated and tested on abstract case studies; then, it is applied
on robotics applications in a simulated environment, in particular on a task
that requires the robot to develop a sort of internal memory to be achieved.
Finally, the designed BNs are ported and tested in a real robotic platform.
1.1 Outline of the work
The reminder of the thesis is organized as follows.
In Chapter 2, we introduce some background concepts such as the one of
rational agents, which represent a kind of intelligent entities, and the main
features of the techniques used to program them. We also outline a promi-
nent example of one of these methodologies, that is Evolutionary Robotics.
In Chapter 3, we present Boolean Networks (BNs), describing their dy-
namics and properties. We also illustrate the most relevant analytical studies
and the state of the art of BN design.
1.1 Outline of the work 3
Chapter 4 provides the description of the proposed methodology: we
model the automatic BN design as an optimisation problem and we adopt a
metaheuristic technique to solve it.
In Chapter 5 we apply our methodology on abstract case studies in order
to validate it. Then, we analyse BNs resulting from the validation process in
terms of their dynamic regime.
In Chapter 6 the methodology is applied on robotics applications. The
first case study aims to acquire familiarity with the methodology and con-
cerns a simple task (i.e., path following) in which a robot selects its actions
basing only on current sensory inputs. The second case study presents a more
complex task, which requires the robot to develop a sort of internal memory
to achieve it. The automatic BN design concerning both the case studies is
executed in a simulated environment. At the end of the process, we port the
obtained behaviour models (i.e., BNs) into a real robotic platform and we
undertake extensive experiments in order to evaluate their effectiveness.
Finally, Chapter 7 draws some conclusions and gives an outlook for future
works.
4 Introduction
2. Background: robotic agent
programs
This chapter aims to introduce background concepts that will be used in the
following of this thesis. In Section 2.1, we introduce the concept of rational
agent. We will use the key word agent in a generic sense, meaning anything
that can be viewed as perceiving its environment through sensors and acting
upon that environment through actuators. Moreover, we illustrate the main
characteristics of rational agent programming techniques. Finally, we outline
a prominent example of one of these methodologies, namely Evolutionary
Robotics.
2.1 Rational agents
One of the goals of artificial intelligence consists of synthesize rational agents1.
An agent is anything that can be viewed as perceiving its environment
through sensors and acting upon that environment through actuators (see
Figure 2.1).
Following the notation of Russell and Norvig [44], we name percept the per-
ception inputs of the agent in a given instant of time, and percept sequence
1To introduce this concept we follow Russell and Norvig’s approach [44].
6 Background: robotic agent programs
Figure 2.1: Agents interact with the environment through sensors and actu-
ators.
the complete sequence of inputs that the agent has perceived so far.
In general, an agent could establish the action to perform in a given instant
of time according to the percept sequence collected until such instant; the
mapping between any percept sequence and the respective action is called
the agent function. The agent function is an abstract mathematical function
and it is implemented by the agent program.
Once defined the concept of agent, we have now to clarify the meaning
of rationality. In a first approximation, we can assert that a rational agent is
the one that chooses the right action, where the right action is the one that
will cause the agent to be most successful. We utilise the term performance
measure to denote the criteria that determine objectively how successful an
agent is.
In general, the sequence of the agent actions is meant to achieve a task. In
order to completely define a task and its features, we introduce the concept
of task environment, that consists of a set of data concerning the following
items:
• performance measure: a function that quantifies the quality of the
agent actions. It depends on the requirements of the task to solve;
• environment: sometimes all the information the agent needs is ob-
2.1 Rational agents 7
servable in each instant of time, sometimes not; moreover, the char-
acteristics of the environment may condition significantly the agent
actions;
• sensors: the types of information the agent can perceive depend on its
sensors; furthermore, the kind and precision of the sensors define the
agent capabilities;
• actuators: the typologies of actuators determine which actions the
agent can perform.
The diverse nature of task environments requires two main kinds of agent
programs to accomplish the relative tasks:
• memoryless : this kind of agent program can produce actions based
only on its current sensory information. It is utilised when the agent
does not need to display any sort of memory of its interactions with
the environment (e.g., the pieces of information that the agent needs
to select the best action are observable at every instant of time);
• memory based : in this case the agent program must be able to keep
memory of (part of) previous input patterns, as well as its actions,
in an internal state. Then, it produces actions based on both internal
dynamics and current sensory information. This kind of agent program
is suitable when the choice of the action to perform depends also on
the past history of the agent. Thus, memory based agent programs are
more complex but also more powerful than the memoryless ones.
The task environment description may determine also how to design the
agent program. For example, the task environment sometimes may be not
completely defined, either because the designer may not be able to clearly
define the performance measure, or because sensory input noise can not be
8 Background: robotic agent programs
easily modeled, or the environment could present unpredictable events. On
the other hand, sometimes the task environment may be completely and ac-
curately defined, but the designer either may not be able to find a solution
or he could not be interested in particular features of the solution, demand-
ing only an agent program that lets the agent achieve the task. In general,
designers can face these kinds of problems choosing between two different
approaches: the first one consists of designing the agent program by hand,
then test, debug it and, if necessary, reverse engineer. This process can be
iterated until the agent program satisfies the requirements. The second ap-
proach consists of using a methodology that automatically finds the solution
and implements the agent program. In general, automatic methodologies are
based on two main components:
• agent program model: a parametric model of the agent program.
The value of the parameters is initially unknown;
• optimisation algorithm: its purpose is to find a good set of param-
eters for the agent program model.
In order to illustrate the main characteristics and issues of the automatic
methodology, that is the one adopted in this thesis, in the next section we
briefly present Evolutionary Robotics, one of the most notable examples of
this kind of design methodology.
2.2 Evolutionary Robotics
Evolutionary Robotics (ER) is a methodological tool to automate the design
of agent programs [40]. It is inspired by the Darwinian theory of natural
selection, that is the principle of selective reproduction of the fittest individ-
ual in a population. This means that the individual that adapts best to its
2.2 Evolutionary Robotics 9
environment has a higher probability to reproduce and to pass its genetic
material to the following generations. In particular, ER is based on the ap-
plication of evolutionary computation techniques [18] [23] to sets of agent
programs. The process can be described as follows: given a certain task en-
vironment, a population of genotypes is generated randomly (the genotypes
being the population individuals), and each genotype corresponds to an agent
program. Then, the process consists of a number of iterations called genera-
tions. For each generation, genotypes are tried one by one in the environment
and evaluated in achieving the task. The evaluation of a genotype is given by
a fitness function, that often corresponds to the performance measure of the
task environment. Genotypes undergo a process called selective reproduction,
in which the fitter is the genotype, the higher is its reproduction frequency.
The reproduction of a genotype into the next generation usually consists of
a modified copy of the genotype itself. Modifications usually happen as re-
sult of genetic operators such as mutation of individual genes or crossover
(i.e., mixing multiple genotypes together). The process typically ends when
a certain limit on the number of generations is reached, or the designer is
satisfied with the solution(s) obtained.
In ER, agent program models may consist of rule sets, or decisions trees,
but the most commonly used are artificial neural networks (ANNs) [21] [19].
ANNs are computational models that attempt to simulate either the struc-
ture or functional aspects of the biological central nervous system. For the
most part, we distinguish between two kinds of ANNs: feedforward ANN [50]
and recurrent ANN [4] [13]. The first category behaves as a memoryless agent
program, thus it is utilised in tasks in which the agent does not need to dis-
play any sort of memory of its interactions with the environment. Instead,
recurrent ANNs present the same features as a memory based agent program,
thus they are used in tasks in which the agent bases decisions both on current
sensory information and previous input patterns.
In general, ER presents several good qualities. For example, the use of
artificial evolution minimises the incorporation of design prejudices and con-
10 Background: robotic agent programs
straints, that are left to the dynamics of the evolution [20]. Furthermore,
with ER techniques it is possible to optimise the ANN topology as well as
some aspects of hardware design. Artificial evolution often finds solutions to
the problem that were not a priori evident to the experimenter [40]. In addi-
tion, the properties of ANNs guarantee versatility, generalisation capabilities
and tolerance to noisy sensory input.
On the other hand, ER has still to face many challenges [36] and presents
some limits both in the agent program model (i.e., the ANN) and in the
optimisation algorithm (i.e., the genetic algorithm).
The biggest limit is the impossibility to analyse the solutions found and
to reverse-engineer them. This is mainly due to the complexity of the ANN
model; moreover, given the fact that the most majority of the robotic plat-
forms offer low-level processing units, ANNs often prove to be too computa-
tionally demanding. For example, platforms that do not offer floating point
operations impose severe limitations on the implementations of recurrent
ANNs.
The impossibility to analyse the solutions found makes the obtained agent
programs a kind of black-box.
In this thesis we propose another methodology for automatic design of
agent programs in order to tackle these limits. We propose the use of Boolean
Networks as agent program model: this choice is due mainly to the availabil-
ity of tools for their analysis and to the simplicity of the model. Such features
enable us also to use an optimisation algorithm simpler than genetic algo-
rithms.
In Chapter 3 we introduce Boolean Networks and their properties, focus-
ing on the features that will be the subject of our methodology.
3. Boolean Networks
In the beginning of this chapter we will introduce Boolean Networks, de-
scribing their dynamics and properties and some analytical studies. Then,
we will focus on the features that will be the subject of this thesis: Random
Boolean Networks (Section 3.4), and the design of Boolean Networks, whose
state of the art is outlined in Section 3.5.
3.1 Introduction
Boolean Networks (BNs) have been introduced by Stuart Kauffman as a
model for genetic regulatory networks [27] [28] [30] and as an abstraction of
complex systems in order to study the mechanisms of evolutionary processes
in living beings.
The BN structure can be thought of as a directed graph with N nodes.
The number of ingoing arcs in each node is referred to as K. Each node xi
is associated to a Boolean variable and a Boolean function; the arguments
of the xi Boolean function are the Boolean variables of the nodes whose
outgoing arcs are connected to xi. A simple example of BN is showed in
Figure 3.1. The topologies of a BN can be various and they depend on the
specific application field.
The Boolean variables of all the nodes in a given instant of time t, rep-
resent the state of the BN at instant t. Since a Boolean variable can assume
12 Boolean Networks
Figure 3.1: An example of Boolean Network with N = 4 and K = 2.
only two different values, the state space size is 2N .
The BN dynamic behaviour is characterized by a sequence of state up-
dates (i.e., each node changes the value of its Boolean variable according to
the associated Boolean function). Several kinds of update rules and dynam-
ics have been proposed [17]. The most studied one consists of a synchronous
state update for all the nodes, i.e., the Boolean variables are all updated
at the same instant. Such update rule is also the one used in the following
of this thesis. Boolean functions are deterministic, that is the output of a
Boolean function is unambiguously computed depending on its arguments.
The succession of synchronous and deterministic state updates makes the
dynamics of BNs deterministic as well.
The initial state of a BN can be arbitrary or randomly chosen, or it may
depend on the specific application. Since the state space is finite and the
dynamics is deterministic, the state succession assumes this structure:
• initially, the trajectory is characterized by a transient, that is a state
succession in which each state is different from all the previous ones.
The transient can have length 0;
• eventually a state, or a sequence of states, will be repeated. Such
3.2 A formal definition 13
Figure 3.2: The table describes the dynamics of the BN showing the successor
of each state.
sequences are named attractors of the BN and they can be classified
in cycle attractors with period τ > 1 and point attractors with τ = 1.
Point attractors are also known as fixpoints. The set of initial states
that flow towards an attractor is named basin of attraction.
3.2 A formal definition
In more formal terms, a BN is defined as a discrete-state and discrete-time
dynamic system, with binary values.
The state of the BN is an array of N Boolean variables x = (x1, ..., xN),
with xi ∈ {0, 1}.As described in Section 3.1, the dynamics is determined by a succession
of state updates; we denote a state update by the following state transition
function:
F : {0, 1}N → {0, 1}N .
14 Boolean Networks
Since a state update is determined by the Boolean functions and their ar-
guments, in the following we define the state transition function by both of
them.
We introduce formally the definition of projection function in order to
denote the K arguments of the Boolean functions for each node. Let pi be
the projection function that projects the element i from the N -dimensional
space to the K-dimensional space:
pi : {0, 1}N → {0, 1}K 1 ≤ i ≤ N.
Thus pi is a subset of Boolean variables and such subset is the set of arcs
ingoing to xi and its cardinality is equal to K. For example, considering the
BN of Figure 3.1, p1(x) = (x2, x3).
Then for each node xi we define the respective Boolean function:
fi : {0, 1}K → {0, 1}.
Considering a node xi and its inputs, fi associates to each input configuration
a value for xi. The function fi can be described by Boolean expressions or
truth tables. Note that, once defined K, the number of possible functions
for each node is 22K.
Then, considering a state x ∈ {0, 1}N , we can write F by fi:
F(x) = (f1(p1(x)), f2(p2(x)), ..., fN(pN(x))).
Finally, a BN can be completely defined by the tuple B = (Π,Φ, N,K)
where:
N ∈ {1, 2, ...}, K ∈ {0, 1, 2, ...}, K ≤ N ;
Π = {p1, p2, ..., pN};
Φ = {f1, f2, ..., fN}.
The pi functions denote the links between the nodes, that is the topology of
the BN; instead the functional part of the BN is denoted by the fi functions.
3.3 Dynamics 15
3.3 Dynamics
In the following, we define with a more precise terminology the already hinted
concepts of transient, attractor and basin of attraction.
Given t ∈ {0, 1, 2, ...}, let xt be the state of the BN at time t; then, the
state at instant t+ 1 is evaluated as follows:
xt+1 = F(xt).
Defining the space of possible states as SN = {0, 1}N , then, increasing t, the
vector x describes a trajectory in SN . Let x0 be the initial state; since a BN
is a synchronous and deterministic system,
∃t, τ ∈ {0, 1, 2, ..} : F(xt+τ ) = F(xt).
Then, for t′ ≥ t a BN goes through a cyclic trajectory or remains always in
the same state; such trajectory defines the attractor of a BN starting from
the state x0. A BN can have several attractors, depending on the topology
and the Boolean functions fi chosen.
Given a Boolean Network B, let Γ be the state set of an attractor of B
and let t be the instant of time in which the BN goes into the attractor, then:
Γ = {x ∈ SN | x = F(xt′), t′ ≥ t} 1 ≤ |Γ| ≤ 2N .
The attractor represents the stationary part of the dynamic whereas the
trajectory covered for t′ < t represent the transient one.
Naming T the state set of a transient, then:
T = {x ∈ SN | x = F(xt′), t′ < t} 1 ≤ |T | ≤ 2N .
We distinguished between the trajectory and the state set belonging to the
trajectory: that is because the trajectory denotes not only the state set, but
also the state sequence. In fact, a trajectory is a function that matches a
given instant of time with a specific state belonging to SN . Nevertheless, to
16 Boolean Networks
render the notation simpler, we will not keep this distinction anymore.
We can define the attractor basin for an attractor Γ of a BN B as follows:
XΓ = {x0 ∈ SN | ∃t ∈ {0, 1, 2, ..} : Ft(x0) ∈ Γ}.
This formula means that all the initial states leading the BN in the attractor
Γ belong to XΓ.
An attractor is considered stable if, maintaining the connections fixed
between nodes and perturbing the value of a node, after some updates most
of the nodes of the BN assume the same values as before the perturbation.
In biology, this phenomenon is named homeostasis.
3.4 Random Boolean Networks
Besides the classical definition of BN, many variants exist, including asyn-
chronous dynamics and probabilistic update rules [17]. Moreover, extensions
to continuous values have also been studied. Among these variants, the most
studied one is the Random Boolean Network. Random Boolean Networks
(RBNs) are also a brainchild of Kauffman [27] [30].
The peculiarity of a RBN is that functions and connections of the nodes
are generated randomly. This kind of approach is useful when the specific
structure and/or functions of a system are very complex or unknown. Most
of the research activity about this kind of networks concerns the field of evo-
lutionary biology. By analysing the properties of RBNs, researchers made
hypotheses about the origin of life, and also about genes activation mecha-
nisms, cellular differentiation and evolution of complex systems.
Concerning the dynamics of RBNs, three kinds of regime can be distin-
guished: ordered, chaotic and critical. To understand the differences between
these regimes, we can imagine to plot the nodes of a RBN on a square lattice
and let the dynamics flow. Moreover, we can think to colour changing nodes
in green and the frozen ones in red.
3.4 Random Boolean Networks 17
• In the ordered regime, after a transient in which many nodes change
(green), the dynamics stabilises and most of the nodes become static
(red). Therefore, there are only few green “islands” surrounded by a
red “frozen sea”.
• In the chaotic regime, most of the nodes change constantly, so the
scenario is a green sea of changing nodes and some red frozen islands,
that is the opposite than the ordered regime.
The existence of a critical line between these two regimes has been discov-
ered since the early studies about RBNs [45] [3] [46]. In physics, the crossing
of such critical line corresponds to a phase transition.
Resuming the metaphor of the square lattice utilised to illustrate the
ordered regime and the chaotic one, the scenario corresponding to the criti-
cal regime, also named “edge of chaos”, consists of an ordered red sea that
breaks into green islands, and the red islands join and percolate through the
lattice [31].
Features concerning the chaotic regime have been also studied in a more
general context, such as the one of nonlinear equation systems [47], and with
respect to the notion of frustrated chaos in biological networks [5].
An interesting and studied feature of these dynamic regimes is related
to sensivity to initial conditions and robustness to perturbations: as briefly
hinted in Section 3.3, by flipping the state of a node, we can measure how
the perturbation spreads. This can be done comparing the evolution of the
original network and the perturbed one.
In the ordered regime usually the perturbation does not spread, and the
perturbed network returns to the state before the perturbation. Instead, in
the chaotic regime, perturbations tend to propagate throughout the network.
At the edge of chaos, perturbations can propagate but usually not through
all the network.
The propagation of perturbations in RBNs can be measured in several
ways. A well-known technique is the following: given two copies of the same
18 Boolean Networks
RBN and flipping h nodes in one copy, we suppose to map the Hamming dis-
tance between the states of these two copies (i.e., the number of nodes with
different values) onto that obtained after one update of both the two copies.
Repeating this procedure several times and changing value of h between 0
and N (N is the number of nodes) we plot the averages for each value of
h. Finally, it is possible to distinguish between ordered, chaotic and critical
regime measuring the slope of the plot. This plot is the Derrida plot [10],
and modifications of this technique have been proposed [15].
Another important feature about dynamic regimes is that changing the
value of K(the number of inputs of a node), the dynamic regime of the RBNs
changes too. This dependency derives from a particular feature of Boolean
functions: a Boolean function is named canalizing if the output value is de-
termined by only one input value (e.g., in the OR function one input equal
to 1 is enough to have 1 as output). The number of canalizing functions
influences the dynamic behaviour of the RBN. For example, in a RBN with
K = 2, 16 different kinds of functions are possible and 14 of them are canaliz-
ing functions; in these conditions the BN tends to show an ordered behaviour.
More precisely, the RBNs with K ≤ 2 are in the ordered regime, and the
ones with K ≥ 3 are in the chaotic regime. In order to numerically find the
critical line corresponding to the critical regime, we have to introduce the
concept of homogeneity : considering the truth table of a node, homogeneity
is defined as the probability p to have an entry with 0 as output value.
Then, the critical line is defined by the following equation [3]:
2p(1− p) = 1/K.
In the study of gene regulatory networks, the notion of criticality plays a
fundamental role in the solution of the dichotomy between evolvability and
robustness to noise. Researchers conjecture that living beings are in a critical
state [34] [31]. The critical state marries the inherent robustness of ordered
regime and the flexibility of the chaotic regime. Researchers often refer to
this concept as “life at the edge of chaos”.
3.5 Boolean Network design: state of the art 19
0 0,2 0,4 0,6 0,8 1p
0
2,5
5
7,5
10
12,5
15
17,5
20
<K>
chaos
order
Figure 3.3: Relationship between p and K.
3.5 Boolean Network design: state of the art
Most of the research activity on BNs concern the field of evolutionary biol-
ogy; anyway, BNs have also been studied as computational learning systems
[41] [29] [11] and proven capable of tackling hard problems [43].
Despite the high number of analytical studies about the properties and
dynamics of BNs, their synthesis has not been studied thoroughly.
The first contribute in this direction is due to Kauffman and Smith [32]:
an application of evolutionary algorithms to RBNs. The goal is to obtain
RBNs whose dynamics reaches an attractor including a target state. This
study has been followed up by Lemke et al. [35], that add to the goal a
constraint on the attractor length. The main objective of those studies is
to investigate the impact of evolution on RBNs. The results obtained raise
interesting and fundamental questions on the search landscape structure and
the evolutionary dynamics depending on RBN structural characteristics.
More recent studies found that RBNs designed by a simple search al-
20 Boolean Networks
gorithm in order to maximize robustness, present a set of properties which
characterize RBNs in ordered, critical and chaotic regime [48] [16]. More-
over, also other studies about the evolvability of robustness have been pre-
sented [2] [8] [14].
Finally, Roli et al. [42] discuss experimental results on the evolution of
RBNs applying a simple genetic algorithm whose goal is to obtain an attrac-
tor with a given length.
In Chapter 4 we introduce a new methodology to design RBNs in order
to automatically synthesize agent programs.
4. Methodology
In Chapter 2 we have introduced techniques to automatically synthesize
robotic agent programs, reaching the conclusion that it could be useful to
develop a methodology based on Boolean Network (BN) design as further
possibility with respect to the already existent approaches. In fact, BNs rep-
resent a simple agent program model, but they can be characterized by a
complex dynamics (see Chapter 3). Moreover, both the availability of anal-
ysis tools and the developing research on BNs motivate to their use as agent
program model.
The goal of this chapter is to outline such methodology. In order to do
that, in Section 4.1 we describe some background concepts1. Then, in Sec-
tion 4.2 we introduce metaheuristics, the computational method we will use
to design BN. Finally, Section 4.3 reports a description of the methodology.
4.1 Background concepts
A combinatorial problem is a mathematical problem, whose answer depends
on values assumed by a certain set of variables and a certain set of parameters.
This kind of problem can be viewed as the set of all its instances, where an
instance consists of a specification of all the parameters. For each instance,
combinations of variable values form the potential solutions for that instance.
Among combinatorial problems, we distinguish two correlated categories:
1To introduce such concepts we follow Hoos and Stutzle’s approach [25].
22 Methodology
• decision problems: for this problem class, the solutions of a given
instance are specified by a set of logical conditions. They present two
variants:
- decision variant : given a problem instance, the goal is to establish
whether or not a solution exists;
- search variant : given a problem instance, the objective is the
same as the decision variant, with the addition to find a solution
(if it exists). Thus, algorithms able to solve the search variant can
always be used to solve the decision variant. Moreover, for many
combinatorial decision problems, the converse also holds;
• optimisation problems: they present the same features as decision
problems, with the addition of an objective function utilised to evalu-
ate the quality of candidate solutions. Any optimisation problem can
be stated as a minimisation problem or as a maximisation problem,
depending on whether the given objective function is to be minimised
or maximised. Optimisation problems can be also handled as deci-
sion problems setting a bound on the objective function. Optimisation
problems present two variants:
- search variant : given a problem instance, it consists of finding a
solution with minimal (or maximal) objective function value;
- evaluation variant : it consists in finding the optimal value of the
objective function.
Both decision and optimisation problems can be solved by searching solutions
in the space of candidate solutions; yet, given an instance of such problems,
the set of candidate solutions is typically at least exponential in the size of
that instance.
In this context, we are interested in the time required for solving an in-
stance of a combinatorial problem as a function of the size of such instance:
4.1 Background concepts 23
this question represents the core of the computational complexity theory, i.e.,
a theory that studies the classification of computational problems, in terms
of computation time and memory space required to be solved.
The complexity of a computation, concerning a certain class of problems,
is defined according to the size of an instance of the problem and the effi-
ciency of the algorithm adopted. Precisely, we define the complexity of a
problem referring to the complexity of the best algorithm for that problem.
Since generally time complexity is the more restrictive factor, problems are
often categorised into complexity classes with respect to their asymptotic
worst-case time complexity.
Two particularly interesting complexity classes are P and NP . P is the
class of problems that can be solved by a deterministic machine in polyno-
mial time, where deterministic machine means a machine model that takes
decisions unambiguously, basing on its current internal state. Instead, NPis the class of problems that can be solved by a nondeterministic machine in
polynomial time, where nondeterministic machine means a machine model
that, given a current state, takes a decision choosing among a set of possible
decisions. Precisely, this kind of machine does not take random decisions, but
it is an hypothetical machine which has the ability to make correct guesses
for certain decisions.
Every problem in P is also contained in NP , because it is possible to
emulate deterministic calculations on a nondeterministic machine. On the
other hand, a lot of relevant problems are in NP , but they are not contained
in P : this means that we do not know a polynomial time deterministic algo-
rithm to solve these kind of problems, but only exponential time algorithms.
Thus, as soon as an instance of these problems grows, it becomes intractable.
A lot of these hard problems from NP can be translated into each other in
polynomial deterministic time (these translations are also called polynomial
reductions). Given a problem, if each problem in NP can be polynomially
reduced to it, then it is at least as hard as any other problem in NP . This
kind of problems are called NP-complete. Another important class is that of
24 Methodology
NP-hard problems. NP-hard problems are at least as hard as NP-complete
ones. They do not necessarily belong to the class NP themselves, because
their complexity may actually be higher. In particular, NP-hard problems
that are contained in NP are called NP-complete.
A lot of practically relevant combinatorial problems are NP-hard, but
technique to solve them efficiently do exist. To describe a possible way to
solve these kinds of problems, we need to distinguish between two main cat-
egories of solving algorithms:
• exact algorithms: they are guaranteed to find an optimal solution,
in bounded time, for every finite size instance of a problem. On the
other hand, they might need computation times too high for practical
purposes (e.g., NP-hard problems);
• approximate algorithms: they sacrifice the guarantee of finding op-
timal solutions for the sake of getting solutions in a significantly reduced
amount of time.
In the following, we will discuss on the usefulness to adopt an approxi-
mate method to solve hard problems. In particular, in Section 4.2 we in-
troduce metaheuristics, that represent the approximate method adopted in
our methodology.
4.2 Introduction to metaheuristics
Metaheuristics are general search strategies upon which a specific algorithm
for solving an optimisation problem can be designed [7] [43] [25]. They be-
long to the approximate algorithm category and represent the current state-
of-the-art approach for a wide variety of optimisation problems, in areas
such as bioinformatics, logistics, engineering, and business. They combine
4.2 Introduction to metaheuristics 25
diverse concepts for exploring the space of candidate solutions and also apply
learning strategies in order to find (near-)optimal solutions efficiently. These
techniques are successfully applied to optimisation problems since decades
and are particularly effective in tackling problems in which the objective
function is rather complex or even an approximation of the actual optimisa-
tion criterion.
Metaheuristics can be usually classified into trajectory methods, and pop-
ulation based methods. The former kind describes a trajectory over a search
graph (e.g., local search algorithms). Conversely, the latter methods perform
a search process characterized by an iterative sampling of the search space
(e.g., genetic algorithms). Such methods can be combined and integrated,
also with other techniques from Artificial Intelligence and Operations Re-
search, giving origin to Hybrid Metaheuristics [6], which are nowadays the
leading-edge technology in optimisation.
Metaheuristics have been applied for parameter tuning in algorithms [26]
and biological models [39] [38]. Moreover, applications to algorithm de-
sign [53], neural network training [1] [40] and system design [12] [22] have
also been presented. All these applications share a common approach, that
relies in the possibility of modeling the design goals as an optimisation prob-
lem. The decision variables of the problem are the system parameters which
can be set from the external; for example, for a neural network they could
be the network weights. The objective function is implicitly defined by eval-
uating a parameter configuration (i.e., an assignment to decision variables)
by simulating the system. In case of neural networks, the error of the net-
work over the training set can be computed, while for parameter tuning the
performance of the system with respect to a target output can be estimated.
Both local search and population-based methods have been used for system
design and tuning. Hutter et al. [26] and Xu et al. [53] apply an iterated lo-
cal search over the algorithm parameter space, while Nicosia and Sciacca [39]
and Montagna and Roli [38] propose methods based on genetic algorithms
and particle swarm optimisation, respectively.
26 Methodology
Boolean Network
metaheuristic
target
evaluator
Booleanfunctions
objective function
value
simulation
requirements
Figure 4.1: Methodology approach.
4.3 Methodology description
Automatic techniques to synthesize robotic agent programs are characterized
by two main components: the agent program model and the optimisation al-
gorithm (see Section 2.1). In our methodology, the agent program model is
given by a BN and the optimisation algorithm consists of a metaheuristic
technique.
The design of a BN that satisfies given criteria can be modeled as a
constrained combinatorial optimisation problem by properly defining the set
of decision variables, constraints and the objective function. The approach
that we will follow is illustrated in Figure 4.1: the starting point is given by
a RBN (see Section 3.4), whose parameters are the number of ingoing arcs
into each node K and the homogeneity p. The topology of the BN does not
change during the design process. Such process consists of several iterations:
for each iteration, the metaheuristic algorithm operates on decision variables
which encode Boolean functions of the BN. A complete assignment to those
variables defines an instance of a BN. This network is then simulated and
4.3 Methodology description 27
evaluated according to the specific target requirements by a specific software
component. For example, in robotics applications, the simulation of the BN
coincides with the simulation of a robot that tries to achieve a required task;
the behaviour of such robot is governed by the agent program based on such
BN and the objective function is represented by the performance measure of
the robot. Finally, the objective function value is returned to the metaheuris-
tic algorithm that can thus proceed with search. The process ends when a
certain number of iterations is reached or a certain value of the objective
function is obtained; both these parameters are set a priori by the experi-
menter.
Metaheuristics are particularly appropriated for tackling the issue of au-
tomatic design of BNs because they can usually explore huge search spaces
efficiently and heuristic information can be easily integrated. Moreover, since
the design problem can be tackled by subsequent refinements of both the
search model and algorithms, metaheuristics are certainly a very promis-
ing technique for achieving successful results because they can be incremen-
tally developed, starting from a simple strategy to a more sophisticated one.
It is also important to remark that, in design problems in general, finding
a proven optimal solution is not particularly relevant, because the criteria
defining the quality of solutions are an approximation of an (unknown) ac-
tual quality function. Therefore, in these cases, metaheuristics are definitely
preferable over exact algorithms because they have a time complexity that
scales polynomially with the instance size. In addition, metaheuristics can
be easily combined with constraint satisfaction and constraint programming
techniques which can be used for reducing the search space explored.
Since this thesis represents a first attempt to apply the methodology, in
the following we adopt a simple search strategy, that is a stochastic itera-
tive improvement local search algorithm. In general, for a given instance of
a combinatorial problem, the search for solutions takes place in the space
of candidate solutions. The local search process is started by selecting an
initial candidate solution, and then proceeds by iteratively moving from one
28 Methodology
candidate solution to a neighbouring candidate solution, where the decision
on each search step is based on a limited amount of local information only.
In our specific case, the local search operates as follows: each move consists
of choosing randomly both a node of the BN and an entry in the truth table
of the Boolean function characterising such node. Then, the value of such
entry is flipped. If a given move does not lead to an improvement (i.e., a
better value of the objective function), such move is retracted; otherwise, the
move is accepted and the modified BN becomes the new candidate solution.
This kind of local search is known as stochastic local search [25]: in fact,
as first approximation it consists of a local search algorithm in which search
initialisation and move choice are randomised.
In Chapter 5 we execute experiments in an abstract context to validate
the proposed methodology. Then, in Chapter 6 we apply the methodology
to robotics case studies in order to provide a proof of concept.
5. Preliminary studies
In order to validate the methodology described in Chapter 4, we undertake
some experiments whose objective, in general, consists in automatically de-
signing BNs which satisfy target requirements on their trajectory in the state
space. In Section 5.1 we report experiments and results concerning four di-
verse case studies.
Another goal of this chapter is to analyse BNs resulting from these exper-
iments, focusing on the relation between BN dynamic regimes and method-
ology performances. The results of this analysis are reported in Section 5.2.
5.1 Experiments and results
In this section, we apply our methodology on four diverse case studies. The
difference among them is given by diverse target requirements that BNs must
satisfy. Anyway, since these case studies present many common features, we
introduce such aspects before discussing them separately.
Since the adopted optimisation algorithm consists of a stochastic local
search, for each case study we execute 90 independent experiments, where
each of them corresponds to a different BN. Each BN is characterised by 100
nodes and it is autonomous (i.e., BN dynamics flows without receiving any
external input). For each BN, initial connections and functions are randomly
generated with K = 3 (i.e., the number of ingoing arcs for each node is equal
30 Preliminary studies
to 3) and homogeneity p (i.e., considering the truth table of a node, the
probability to have an entry with 0 as output value is equal to p): precisely,
among 90 BNs, 30 are generated with p = 0.5, 30 with p = 0.788675 and
the last 30 with p = 0.85. Such homogeneity values statistically correspond
to chaotic, critical and ordered regime, respectively (see Section 3.4). The
initial state of each BN is randomly generated.
For each experiment, 100000 iterations of the optimisation algorithm are
executed. Each iteration corresponds to a simulation of the respective BN
trajectory lasting T steps, with T = 1000 (i.e., 1000 BN state updates).
Concerning the target requirements, in general they consist of reaching
a target state in different temporal windows. The target state is randomly
generated.
As a first attempt, we used a stochastic local search that, at each itera-
tion, selects randomly a BN node and flips randomly an entry of its Boolean
function. Yet, we noticed that this approach does not provide a good per-
formance. Thus, we decided to use a variant of the described algorithm: it
consists of choosing, at each iteration of the optimisation algorithm, a BN
variable that does not match the target state, and flip randomly an entry
of its Boolean function. The results reported in the following will refer to
this variant of the algorithm; precisely we will report, for each case study,
the respective run length distribution [24], that is the probability of finding
a solution within a certain number of local search steps.
5.1.1 First case study
The goal of the first case study is to design a BN whose trajectory must reach
a given a target state at least once within the temporal interval (0, T ].
At each simulation, a BN is evaluated on the bases of the simulation step
in which the BN presents the highest number of Boolean variables matching
the target state. Thus, let u(t) be the function returning the number of
5.1 Experiments and results 31
0 20000 40000 60000 80000 100000
iterations
0
0.2
0.4
0.6
0.8
1su
cces
s fr
eque
ncy
p=0.5p=0.788675p=0.85
Figure 5.1: Run length distribution of the first case study.
Boolean variables matching the target state at each simulation step t, with
t ∈ (0, T ], the objective function can be described as follows:
mint∈(0,T ]
(1− u(t)
N
).
The results obtained are showed in Figure 5.1: note that all the BNs with
initial homogeneity p = 0.85 reach the goal within 80000 iterations of the
optimisation algorithm. Also BNs initially in critical regime present good
performances, whereas only 10% of chaotic BNs reaches the goal.
5.1.2 Second case study
The goal of the second case study is to design a BN whose trajectory must
reach a given a target state exactly at instant t, with t ∈ (0, T ].
At each simulation, a BN is evaluated checking the quantity of Boolean
32 Preliminary studies
0 t - τ t t + τ T
t
0
0,2
0,4
0,6
0,8
1
f(t ;
γ)
γ = 0.5γ = 1γ = 2
Figure 5.2: Function f(t; γ) used in the second case study to assign a certain
reward to partially successful BNs.
variables matching the target state in t. However, we partially reward also
such states in BN trajectory that either have a high number of Boolean
variables matching the target state, or occur in simulation steps close to t.
To implement this reward rule, we define a family of functions f(t; γ) on the
interval [0, T ] as follows:
f(t; γ) =
0 t < t− τ0 t > t+ τ
1−∣∣∣ t−tτ ∣∣∣γ elsewhere
.
The function f(t; γ) is plotted in Figure 5.2: note that BN states close to
t can be rewarded in diverse ways, depending on the value of parameters γ
and τ .
5.1 Experiments and results 33
0 20000 40000 60000 80000 100000
iterations
0
0.2
0.4
0.6
0.8
1su
cces
s fr
eque
ncy
p=0.5p=0.788675p=0.85
Figure 5.3: Run length distribution of the second case study.
Let u(t) be the function returning the number of nodes matching the
target state at each simulation step t, with t ∈ (0, T ], the objective function
can be described as follows:
mint∈(0,T ]
(1− f(t; γ)
u(t)
N
).
We tried different values of t, τ and γ, without noticing relevant differ-
ences in the performances. In Figure 5.3 we report the results obtained with
t = 500, τ = 10 and γ = 2: note that the performances are a bit worse with
respect to the first case study, but this can be explained by the fact that this
case study presents a tighter constraint than the first one, that is to reach
the target state in a specific simulation step. However, Figure 5.3 confirms
the discrepancy of performances among chaotic, critical and ordered initial
regimes of BNs, as observed in the first case study.
34 Preliminary studies
0 t - τ t T
t
0
0,2
0,4
0,6
0,8
1
f(t ;
γ)
γ = 0.5γ = 1γ = 2
Figure 5.4: Function f(t; γ) used in third and fourth case studies to assign a
certain reward to partially successful BNs.
5.1.3 Third case study
The goal of the third case study is to design a BN whose trajectory must
reach a given a target state at least once within the temporal interval [t, T ],
with t ∈ (0, T ], but not before than t.
As in the second case study, we assign a certain reward also to those BNs
whose trajectory reaches a state either almost congruent to the target one
or a certain number of simulation steps τ before than t. To implement this
reward rule, we define a family of functions f(t; γ) on the interval [0, T ] as
follows:
5.1 Experiments and results 35
0 20000 40000 60000 80000 100000
iterations
0
0.2
0.4
0.6
0.8
1su
cces
s fr
eque
ncy
p=0.5p=0.788675p=0.85
Figure 5.5: Run length distribution of the third case study.
f(t; γ) =
0 t < t− τ1−
∣∣∣ t−tτ ∣∣∣γ t− τ ≤ t ≤ t
1 t > t
.
The function f(t; γ) is plotted in Figure 5.4: note that BN states before than
t can be rewarded in diverse ways, depending on the value of parameters γ
and τ .
Let u(t) be the function returning the number of nodes matching the
target state at each simulation step t, with t ∈ (0, T ], the objective function
can be described as follows: 1 if u(t)N
= 1, t ∈ (0, t− τ)
min(
1− f(t; γ)u(t)N
)otherwise
. (5.1)
Note that this objective function does not reward at all those BNs whose
trajectory reach the target state before than t− τ .
In Figure 5.5, we report the results obtained adopting the same parame-
36 Preliminary studies
ters as the second case study, that are t = 500, τ = 10 and γ = 2. Note that
the performances are almost the same as the second case study. Moreover,
the discrepancy of performances among chaotic, critical and ordered initial
regimes of BNs is confirmed also in this case.
5.1.4 Fourth case study
The goal of the fourth case study is the same as the third one, with a further
constraint: when a BN trajectory reaches the target state, then such state
must be kept. In other words, the target state must represent a BN fixpoint.
As in the second and third case studies, we assign a certain reward also to
those BNs whose trajectory reaches a state either almost congruent to the
target one or a certain number of simulation steps τ before than t. At each
simulation, let t′ ∈ [t − τ, T ] be the simulation step correspondent to the
state with the highest number of Boolean variables congruent to the target
state. To verify that BN state in t′
is a fixpoint of the BN, it is enough to
check if the state at t′+ 1 is equal to the one in t
′. If this occurs, then we
can assert that the BN trajectory has reached a fixpoint. This statement
is valid because we are considering BNs with deterministic dynamics and
synchronous state updates.
Analysing the requirements, two different main features can be noticed:
first of all, the network has to reach the target state, but not before t − τ .
To evaluate this aspect, we can use the same objective function as the third
case study, that is defined by Equation (5.1). The second aspect consists
of making the target state a fixpoint for the BN. A way to merge these
two aspects is to create an objective function based on a weighted mean, as
follows: 1 if u(t)N
= 1, t ∈ (0, t− τ)
min(αx(t
′) + (1− α)y(t
′))
otherwise.
5.1 Experiments and results 37
0 20000 40000 60000 80000 100000
iterations
0
0.2
0.4
0.6
0.8
1su
cces
s fr
eque
ncy
p=0.5p=0.788675p=0.85
Figure 5.6: Run length distribution of the fourth case study.
where x(t′) is defined by Equation (5.1) and y(t
′) is a function that com-
pares BN states in t′
and t′+ 1, returning the ratio between number of not
congruent Boolean variables and the total number of BN nodes. Thus, when
y(t′) = 0, the BN state in t
′represents a fixpoint for the BN.
Different values for α account for different relative importance between
reaching the target state and keeping it. We tried various values of this
parameter (i.e., α ∈ {0.25, 0.5, 0.75}) to explore the effects on the optimisa-
tion process, and we noticed that small values of α (i.e., aiming to favour
the reaching a fixpoint regardless of the target state) correspond to better
performances. In Figure 5.6, we report the results obtained with γ = 0.5
and other parameters setted to the same values as the results showed for the
second and the third case study, that are t = 500, τ = 10 and γ = 2. Note
that the performances are much better with respect to all the previous case
studies: we conjecture that, introducing the constraint of reaching a fixpoint,
the exploration of the search space is different and local search can explore
38 Preliminary studies
Ordered networks
Non-optimal Optimal
initial BN 0.7354200 0.7290419
final BN 1.077538 1.158360
Table 5.1: Derrida’s parameters for BNs initially ordered and then designed
by local search.
Critical networks
Non-optimal Optimal
initial BN 0.9500681 0.9419760
final BN 1.165276 1.197605
Table 5.2: Derrida’s parameters for BNs initially critical and then designed
by local search.
Chaotic networks
Non-optimal Optimal
initial BN 1.414794 1.425388
final BN 1.3674872 0.9226405
Table 5.3: Derrida’s parameters for BNs initially chaotic and then designed
by local search.
it more effectively.
5.2 Result analysis
The results described in Section 5.1 showed that the proposed methodol-
ogy can obtain a good performance and they also represent a validation of
the tool. In this section, we analyse the BNs obtained by the optimisation
5.2 Result analysis 39
algorithm in terms of their dynamic regimes. As described in Section 3.4,
Derrida’s plots represent an useful tool for this kind of study. We estimate
Derrida’s parameter of a BN as the ratio between the Riemann integrals of
the Derrida’s curve and the bisectrix, in [0,10].
Our analysis proceeds by considering BNs obtained from all the experi-
ments described in Section 5.1: we separate such BNs in 3 groups, on the ba-
sis of their initial dynamic regime (i.e., ordered, critical and chaotic). Then,
we split each group between successfully designed BNs and not successfully
designed ones. For each subgroup obtained, we compute the average of Der-
rida’s parameters of the initial BNs and the average of Derrida’s parameters
of the BNs designed by local search. The obtained averages are reported in
Tables 5.1, 5.2, and 5.3.
Note that, in general, the Derrida’s parameter of BNs designed by local
search tends to 1 (i.e., the Derrida’s parameter corresponding to the critical
regime). This assertion is more evident if we consider only optimal BNs.
These results are very interesting because, as described in Section 3.4, the
critical regime is the more studied and promising one, since it marries the in-
herent robustness of ordered regime and the flexibility of the chaotic regime.
Future works will concern deeper studies in this direction. However, the goal
of this thesis is to prove the applicability of the methodology to robotics ap-
plications. Thus, now that the methodology has been validated, in Chapter
6 we can try to apply it on that kind of case studies.
40 Preliminary studies
6. Robotics applications
In this chapter we apply our methodology on robotics case studies. Precisely,
the goal is to verify its capability to automatically devise a memory based
agent program, that is an agent program able to keep memory of previous
input patterns in an internal state (Section 2.1).
Our starting point is the case study described in Section 6.1. It aims to
test our methodology on a very simple task, that can be actually solved by a
memory less agent program. Since our main goal is to obtain memory based
agent programs, once we find a solution, we move on to more complex tasks.
For this reason, we do not report a complete analysis of this first case study.
In Section 6.2, we apply our methodology to a case study that needs a
memory based agent program to be achieved. This kind of task enables us
to exploit the powerful dynamics of BNs. All the obtained agent programs
have been achieved in simulation. To evaluate their effectiveness in a more
realistic setup, we also port them into a real robotic platform and conduct
extensive experiments.
6.1 Path follower
In order to acquire familiarity with the use of our methodology in robotics
applications, we simulate, as a first attempt, an agent up against a path
following task. It consists of an agent that has to follow a given path, e.g., a
line or a circle.
42 Robotics applications
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Agent
Circularpath
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Agent
Circularpath
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Agent
Circularpath
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Agent
Circularpath
Figure 6.1: Path follower environment.
6.1.1 Task definition
In order to completely define the task, we describe its task environment:
• environment: it consists of a squared arena (1m x 1m) in which a
path is traced. In particular, the path is defined drawing two parallel
lines and the agent follows the path advancing between such lines (see
Figure 6.1). The simulation is time discrete, that is, for each instant
of time, the agent perceives the environment and selects the action to
perform.
• performance measure: given a certain simulation time T , the agent
must satisfy two properties to achieve successfully the task: spending
as much time as possible on the path, without going out, and advancing
as much as possible along the path (e.g., cover a full circle on a circular
path). We define the performance measure by minimising an error
6.1 Path follower 43
function E ∈ [0, 1]. This error function is given by a weighted mean
including two contributes. The first one measures the time the agent
spends along the path: since both the task simulation and the BN
dynamics are time discrete, at each step we can check if the agent is
on the path or not. If the agent is on the path, it is rewarded with 1
point, otherwise it is not rewarded (0 points). The second contribute
measures how much the agent has advanced on the path. Thus, we can
write the error function E as follows:
E = α(
1−∑T
i=1 siT
)+(
1− α)(
1− (pmax − pf )),
where:
∀i ∈ [1, T ], si =
1 if the agent is on track
0 otherwise
and:
- pmax is the position in the arena that the agent can reach in the op-
timal case, i.e., the agent advances on the path at each simulation
step;
- pf is the position in the arena actually reached by the agent.
The performance measure is evaluated minimising E, i.e., the smaller
is E, the better is the agent performance.
Note that this kind of performance measure enables the experimenter
to favour one contribute with respect to the other one by changing the
value of α. For example, if α = 0.25, then the result will be an agent
that covers a long path portion even though it sometimes goes out from
the path.
• sensors: sensors allow the agent to know its position with respect to
the path in each instant of time. Precisely, the agent knows if it is on
the path, or the path is on its right side, or the path is on its left side,
but not the distance to it.
44 Robotics applications
• actuators: the agent is equipped with two motors and each of them
controls a wheel. The speed value can assume for each wheel only two
values, that are ON or OFF.
To achieve this task a memoryless agent program is enough (see Section 2.1)
because, at each instant of time, the only necessary piece of information to
set the wheel speed correctly is whether or not the robot is on the path or
not.
6.1.2 BN setup
On applying our methodology, we use as first attempt a small BN with 10
nodes, i.e., N = 10. Initial connections and functions of the nodes are ran-
domly generated with K = 3 (i.e., the number of ingoing arcs for each node
is equal to 3) and homogeneity p = 0.5 (i.e., considering the truth table of
a node, the probability to have an entry with 0 as output value is equal to
0.5).
For robotics applications, we use BNs with some output nodes and some
input nodes : output nodes are nodes whose Boolean variable, at each instant
of time, is used to set actuator values. In order to do this, it is necessary to
make a suitable mapping between Boolean values and value range of actua-
tors. For example, we may associate a BN output node to an agent wheel:
if, in a given instant of time, the Boolean variable is equal to 1, then the
wheel assumes a certain constant speed. Otherwise, the wheel is stopped.
Input nodes are nodes whose Boolean variable, at each instant of time, is
not determined by BN dynamics, but it is set according to the value of agent
sensors. Thus, in order to set node Boolean variables, it is necessary to make
a suitable mapping between value range of sensors and Boolean values. For
example, we may associate a sound sensor to an input node: if, in a given
instant of time, the sound sensor perceives a sound, then the Boolean vari-
able of the input node assumes value 1, otherwise 0.
6.1 Path follower 45
Percept x1 x2
Agent on path 0 0
Agent on right of path 0 1
Agent on left of path 1 0
Table 6.1: Mapping between agent sensors and BN input nodes for the path
following task.
Actuators
x3 x4 Left wheel Right wheel
0 0 OFF OFF
0 1 OFF ON
1 0 ON OFF
1 1 ON ON
Table 6.2: Mapping between agent actuators and BN output nodes for the
path following task.
For this case study, in order to map agent sensors and actuators onto the
BN, we set two nodes (x1, x2) as input nodes and two other nodes (x3, x4) as
output nodes. In particular, the mapping between perceptions/actions and
node values is described in Table 6.1 and Table 6.2.
6.1.3 Experiments
The experiments are executed using two different kinds of path: a line and
a circle. We empirically estimated that 1500 simulation steps are enough to
let the agent advance until the end of the path, thus each simulation lasts
1500 steps, i.e., T = 1500.
46 Robotics applications
Executing several independent experiments, we notice that the optimisa-
tion algorithm returns a successful BN after less than 1 hour of computation
time, both when the path is straight and circular. In particular, the error
function returns values less than 0.1: it means that at least in 90% of the
simulation steps the agent advances on the path.
Since the goal of this first experimental session is just to test and set up
our methodology in robotics, we will not dwell deeply on the analysis of the
obtained results. However, we report an example of successful simulation in
Figure 6.2. The ease of obtaining these preliminary results makes it possi-
ble to move on to more complex and interesting tasks (Section 6.2). Since
this represents the main goal of this thesis, we will provide a deeper result
analysis and test on real robots.
6.1 Path follower 47
(a) t = 0 (b) t = 180
(c) t = 400 (d) t = 650
(e) t = 850 (f) t = 1200
Figure 6.2: An example of successful path following on a circular path.
48 Robotics applications
6.2 Phototaxis and antiphototaxis
The case study analysed in the following consists of an agent that selects
actions with respect to a source light. Precisely, the agent must be able
to perform two different behaviours: going towards the light (phototaxis)
or moving away from it (antiphotaxis). In the beginning, the agent must
perform phototaxis; then, it must switch its behaviour to antiphototaxis
after perceiving a clap. Thus, the agent needs to keep memory about the
perceiving of the clap to select the best action to perform at each simulation
step. This means that a memory based agent program becomes necessary to
achieve this task.
The goal of this section is to demonstrate that our methodology is capable
to automatically build a memory based agent program. Thus, in this context
we do not focus on a statistical analysis concerning the methodology success
rate, but we focus on providing a proof of concept.
6.2.1 Task definition
In order to completely define the task, we describe its task environment:
• environment: it consists of a squared arena (1m x 1m) with a light
source positioned in the origin (see Figure 6.3).
• performance measure: in the beginning of the experiment, the agent
is positioned close to the opposite vertex of the arena with respect to
the light. Then, given a certain simulation time T , the agent must
satisfy two properties to achieve successfully the task:
- it must go towards the light at each simulation step, until the
perceiving of a clap (phototaxis);
6.2 Phototaxis and antiphototaxis 49
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Light
Agent
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
t = 0
Light
Agent
Figure 6.3: Phototaxis and antiphototaxis environment.
- let tc be the instant of time in which the agent perceives the clap,
then the agent must move away from the light (antiphototaxis)
for all the simulation steps subsequent to tc.
We define the performance measure by minimising an error function
E ∈ [0, 1]. This error function is given by a weighted mean including
the phototaxis contribute and the antiphototaxis one: in fact, since
both the simulation and the BN dynamics are time discrete, at each
step we can check if the agent is moving in the correct direction with
respect to the light, and reward it with 1 point, or not (0 points). Thus,
we can write the error function E as follows:
E = α(
1−∑tc
i=1 sitc
)+(
1− α)(
1−∑T
i=tc+1 si
T − (tc + 1)
),
where:
∀i ∈ [1, tc], si =
1 if the agent went towards to the light
0 otherwise
50 Robotics applications
∀i ∈ [tc + 1, T ], si =
1 if the agent moved away from the light
0 otherwise
The performance measure is evaluated minimising E, i.e., the smaller
is E, the better is the performance measure.
Note that this kind of performance measure allows the experimenter
to favour one term with respect to the other one changing the value of
α. For example, if α = 0.25, the result will be an agent that performs
well phototaxis, but cannot perform antiphototaxis at all, or start to
perform it several steps after the clap.
• sensors: light sensors allow the agent to know its position with respect
to the light in each instant of time. The agent has a circular body and
it is equipped with 4 light sensors whose value is binary (ON/OFF).
Diverse combinations of switched on sensors allow the agent to perceive
the light in 8 different angles of its body. We denote the 8 possible
perceptions assigning to each of them a numerical identifier, naming it
“percept ID”. The light sensor disposition and the 8 percept IDs are
outlined in Figure 6.4. Moreover, the agent is equipped with a sound
sensor whose value can be ON or OFF: at each simulation step the
value is ON if the agent perceives a clap, OFF otherwise.
• actuators: the agent is equipped with two motors and each of them
controls a wheel. The speed value can assume for each wheel only two
values, that are ON or OFF.
To achieve this task, it is necessary to produce a memory based agent pro-
gram. In fact, at each simulation step, to choose between phototaxis and
antiphototaxis, the agent must remember whether or not a clap has been
perceived.
6.2 Phototaxis and antiphototaxis 51
2
1
8
3
4
5
6
7
Figure 6.4: Agent light sensors exploited in phototaxis and antipho-
totaxis case study: the 4 circles represent the light sensors and the
number labels outline the 8 percept IDs.
6.2.2 BN setup
In this case study we use a BN with more nodes with respect to the one
utilised for the path following task (Section 6.1.2). The reason of this choice
derives from the complexity required: in this case study, the BN dynamics
needs to keep an internal state, thus, in order to create this kind of dynamics,
we think it appropriate to use an higher number of nodes than the one used
by a BN associated to a memoryless agent program. Moreover, in this case
study, the number of sensors is higher than the one of the agent used in
the path following task: this means that also an higher number of BN input
nodes will be required. Therefore, we use as a first attempt a BN with 20
nodes (N = 20). Initial connections and functions of the nodes are randomly
generated with K = 3 (i.e., the number of ingoing arcs for each node is equal
to 3) and homogeneity p = 0.5 (i.e., considering the truth table of a node,
the probability to have an entry with 0 as output value is equal to 0.5).
In order to map sensors and actuators onto the BN, we use one input
52 Robotics applications
Percept ID x2 x3 x4 x5
1 1 0 0 0
2 1 1 0 0
3 0 1 0 0
4 0 1 1 0
5 0 0 1 0
6 0 0 1 1
7 0 0 0 1
8 1 0 0 1
Table 6.3: Mapping between light percept IDs and BN input nodes in the
phototaxis and antiphototaxis case study.
Percept x1
Clap perceived 1
Clap not perceived 0
Table 6.4: Mapping between clap perceptions and BN input node in the
phototaxis and antiphototaxis case study.
node (x1) to insert the clap sensor value, four input nodes (x2, x3, x4, x5)
to insert the four light sensor values and two other nodes (x6, x7) as output
nodes to pilot the two wheels. The mapping between perceptions/actions
and node values is described in Tables 6.3, 6.4 and 6.5.
6.2.3 Experiment outline
Considering the size of the arena and the speed of the agent wheels, we em-
pirically estimated that 1000 simulation steps are enough to let the agent
achieve the task.
6.2 Phototaxis and antiphototaxis 53
Actuators
Left wheel Right wheel x6 x7
OFF OFF 0 0
OFF ON 0 1
ON OFF 1 0
ON ON 1 1
Table 6.5: Mapping between actuators and BN output nodes in the photo-
taxis and antiphototaxis case study.
Since the optimisation algorithm consists of a stochastic local search, we
execute 30 independent experiments and each of them corresponds to an ini-
tial different BN (randomly generated as described in Section 6.2.2). In each
experiment we train the agent to achieve the task in a simulated environment,
regardless initial conditions, such as its initial position in the arena and ori-
entation. The set of initial conditions form the training set. At the end of the
training process, we test obtained BNs in a simulated environment on further
initial conditions. Then, we port into a real robotic platform the best BN
resulting from such testing process and we undertake extensive experiments
in order to evaluate its effectiveness. In order to obtain a suitable agent
program, we proceed iterating the whole process (i.e., training in simulated
environment, testing in simulated environment and testing in real robotic
platform).
In Sections 6.2.5, 6.2.6 and 6.2.7 we outline features and results concern-
ing three different trainings and the respective tests on real robots.
Before discussing such trainings and testing on real robots, we describe
how we embed a BN in a real robot.
54 Robotics applications
28
3
4
5
6
7
1
IR 0
IR 1
IR 2
IR 3IR 4
IR 5
IR 6
IR 7
Figure 6.5: Disposition and IDs of e-puck proximity sensors.
6.2.4 Robot setup
In order to test on a real robot the agent programs obtained by trainings,
we use the e-puck robot. Here we focus only on e-puck features that are
necessary to achieve the task; a more detailed description of the e-puck is
reported in Appendix A.
To perceive the light, the robot is equipped with 8 IR proximity sensors
whose disposition is showed in Figure 6.5. Since the disposition of proximity
sensors is different with respect to the light sensors of the simulated agent,
we combine the values of e-puck proximity sensors to obtain the same config-
uration as the simulation one (see Figure 6.4). The correspondence between
percept IDs and BN input node values, is the same as the simulated agent:
a complete summary of these mappings is reported in Table 6.6.
Concerning sound perception, the e-puck is equipped with 3 microphones;
the clap can be perceived by setting thresholds on the readings and keeping
the highest value. Then, the BN input node concerning the clap perception
is updated according to the same schema as in the simulation (see Table 6.4).
6.2 Phototaxis and antiphototaxis 55
min(e-puck sensors) Percept ID x2 x3 x4 x5
mean(IR0,IR7) 1 1 0 0 0
IR1 2 1 1 0 0
IR2 3 0 1 0 0
mean(IR2,IR3) 4 0 1 1 0
mean(IR3,IR4) 5 0 0 1 0
mean(IR4,IR5) 6 0 0 1 1
IR5 7 0 0 0 1
IR6 8 1 0 0 1
Table 6.6: Mapping between e-puck IR proximity sensors and BN input
nodes in phototaxis and antiphototaxis case study. The minimum among
the values in the first column, determines the agent percept ID and BN
input node values.
Finally, the BN output node values determine the velocity of e-puck
wheels according to the same mapping as in simulation (see Table 6.5).
6.2.5 First training: Zoolander
The goal of the training is to obtain an agent able to achieve the task re-
gardless of its specific initial position, its orientation and instant of time in
which the clap will be perceived.
In order to reach this goal, we use a training set composed of 30 dif-
ferent initial conditions for each experiment. Such conditions are generated
randomly according to the following parameters:
• Position: x ∈ [0.5, 1] ∧ y ∈ [0.5, 1];
• Orientation: θ ∈ [0, 2π];
56 Robotics applications
• Clap time: tc ∈ [400, 600].
For each independent experiment we execute 100000 iterations of the op-
timisation algorithm, corresponding about to 4 days of computation time.
Anyway, in some experiments 5000 iterations (i.e., 5 hours of computation
time) are enough to obtain a BN with good performance measure, that is
the median of error functions less than 0.1.
At the end of the training, we test the obtained agent programs on further
initial conditions to verify their actual skill to achieve the task regardless of
the initial conditions.
The results of this first training and relative testing are showed in Fig-
ure 6.6: in the training graph (Figure 6.6(a)) we can observe that the 6 BNs
on the left obtain a good performance measure, i.e., the medians of error
functions are less than 0.1. Concerning the testing results (Figure 6.6(b)), in
general the performance measures are a bit worse, but however for the first 6
BNs the medians of error functions are less than 0.1 as in the training graph.
Porting the best BN obtained on the e-puck, we observe immediately
that the robot performs successfully the task only in some specific cases.
Precisely, when it is performing phototaxis, it may happen that the light is
initially in front of the robot and after some time it is perceived off-center: if
the robot perceives the light on its right, then it is able to correct its orien-
tation and straighten up the light1; otherwise, if the light is perceived on its
left, then the robot starts to perform antiphototaxis without perceiving any
clap2. Figure 6.6(b) reports this problem: since the BN used on the e-puck
is the first one on the left, note that there are several outliers (i.e., high error
function values).
We label the result obtained from this training as Zoolander, because the
obtained behaviour recalls the main character of the homonym movie [52]:
he is a top model that shows an inability to turn left.
1See Video 1 available at http://iridia.ulb.ac.be/supp/IridiaSupp2010-010/2See Video 2 available at http://iridia.ulb.ac.be/supp/IridiaSupp2010-010/
6.2 Phototaxis and antiphototaxis 57
0.0
0.2
0.4
0.6
0.8
1.0
30 trained BNs ordered by median of error functions
err
or
fun
ctio
n
(a)
0.0
0.2
0.4
0.6
0.8
1.0
Testing on the 30 trained BNs
err
or
fun
ctio
n
(b)
Figure 6.6: Results of the first training (a) and the relative testing (b).
58 Robotics applications
In order to fix this problem, we apply a new training on the same BNs,
adding a noise component to increase the robustness of the agent program.
6.2.6 Second training: Garrincha
The goal of this training is to fix the Zoolander problem, i.e., the robot must
be able to straighten up towards the light either when it perceives the light
on the left or on the right. In order to reach this goal, we apply the same
training set as the first training (described in Section 6.2.5) with the addition
of actuation noise. This noise consists of a change of the agent orientation in
a certain instant of time tn during the simulation. Let θn be the orientation
variation, then the value of θn must be such that the light is perceived off
center in tn+1. Thus, each element of the training set is characterized by the
following parameters:
• Position: x ∈ [0.5, 1] ∧ y ∈ [0.5, 1];
• Orientation: θ ∈ [0, 2π];
• Clap time: tc ∈ [400, 600];
• Orientation variation: θn ∈ [−π8, π
8];
• Noise time: tn ∈ [100, 400].
Moreover, considering that in some experiments of the Zoolander training
we obtained good performances after 5 hours of computation time, in this
training we decide to reduce the number of iterations of the optimisation
algorithm from 100000 iterations to 25000, correspondent to 1 day of com-
putation time.
At the end of the experiments, we notice that the obtained agent pro-
grams are able to correct their attitude with respect to the light after the
6.2 Phototaxis and antiphototaxis 59
application of the actuation noise, but they are not able to perform antipho-
totaxis after the perceiving of the clap. To fix this behaviour, we repeat the
same experiments, but splitting the training in two phases:
• in the first 5000 iterations of the optimisation algorithm, the simula-
tions last only 500 time steps and they do not present any clap. The
goal is to obtain agent programs able to perform phototaxis and to
show robustness to actuation noise;
• in the subsequent 20000 iterations, the simulations last 1000 time steps
and they involve the clap. Thus, the idea is to train the agent gradually.
In the testing, we add also a component of sensor noise in order to verify
more tightly the robustness of the obtained agent programs.
The results of training and testing are showed in Figure 6.7: since it is
harder to achieve the task in a noisy environment with respect to an ideal one,
in general the performance measure values are worse than the first training
(Figure 6.6). However, observing the second training graph (Figure 6.7(a)),
we note 2 BNs with a median of error functions close to 0.1 and compact
span values. Concerning the testing graph (Figure 6.7(b)), since we add a
component of sensor noise not provided in the training, it is expected to
obtain worse performances. Nevertheless, also in testing we obtain 1 BN
with good performance measure, either in terms of its error function median
or in terms of span values compactness. Moreover, note that most of the BNs
show a median of error functions equal to 0.5: this is because most of the
agent programs obtained are able to perform phototaxis, but they are not
able to perform antiphototaxis at all. Nevertheless, since our goal is to show
a proof of concept, that is an agent program based on BN that achieves the
task, in this thesis we do not try to obtain an higher percentage of successful
BNs. However, it is clear that some features of our methodology can be
improved or changed in order to obtain an higher number of successes.
Porting the best BN obtained on the e-puck robot, we notice that the
60 Robotics applications0
.00
.20
.40
.60
.81
.0
30 trained BNs ordered by median of error functions
err
or
fun
ctio
n
(a)
0.0
0.2
0.4
0.6
0.8
1.0
Testing on the 30 trained BNs
err
or
fun
ctio
n
(b)
Figure 6.7: Results of the second training attempt (a) and the relative testing
(b).
6.2 Phototaxis and antiphototaxis 61
agent program presents an imperfection concerning the bearing of the robot:
its movement is not smooth, it is meandering because the dynamics of the
specific BN makes the robot wheels alternately activated3. We label the result
obtained from this training Garrincha, because the meandering movement
of the robot recalls the famous Brazilian footballer Garrincha [51].
To solve this imperfection we apply a post processing technique: each
wheel is activated or deactivated according to the average of the relative
output node values over a temporal window. In this way, the robot walks
smoothly and achieves the task at the same time, showing two important
properties4:
• every time the robot perceives the light off center, either on its left or
its right, it is able to correct its attitude with respect to the light the
light and to achieve the task;
• the robot clearly shows memory of the previous perceptions: if we turn
the robot by π, either during phototaxis or antiphototaxis, it is able to
recover the correct position and to achieve the task.
6.2.7 Third training: back and forth
Although we have already reached the goal to obtain a memory based agent
program with our methodology, we try to make the task more complex in-
creasing the number of clap within the same simulation. In particular, we
execute a new training, with the same parameters of the previous ones, but
presenting 3 claps in the same simulation. In order to do that, we need a
simulation time longer than the previous training because with three claps
the agent has to perform two phototaxis and two antiphototaxis. Consid-
ering the speed of the agent and the size of the arena, we estimated that a
3See Video 3 available at http://iridia.ulb.ac.be/supp/IridiaSupp2010-010/4See Video 4 available at http://iridia.ulb.ac.be/supp/IridiaSupp2010-010/
62 Robotics applications0
.00
.20
.40
.60
.81
.0
30 trained BNs ordered by median of error functions
err
or
fun
ctio
n
Figure 6.8: Results of the third training attempt.
simulation lasting 2500 steps is enough (T = 2500). Then, let t′c and t′′c be
the instants of time in which the second and the third claps take place; such
parameters are randomly generated in the following intervals:
• t′c ∈ [1150, 1300];
• t′′c ∈ [1700, 1850].
The results of this training are showed in Figure 6.8: since the task is more
complex, it is expected that in general the performance measures have worse
values with respect to the previous trainings. However, we obtain 2 BNs
with median of error functions less or equal to 0.1.
Porting the best BN obtained on the e-puck, we notice that the robot
6.2 Phototaxis and antiphototaxis 63
achieves the task5.
Finally, the results obtained in the phototaxis and antiphototaxis case
study represent a proof of concept of the proposed methodology. In Chap-
ter 7, we describe the conclusions of this thesis and we outline future works
in order to improve the methodology, both concerning the agent program
model (i.e., the BN) and the optimisation algorithm.
5See Video 5 available at http://iridia.ulb.ac.be/supp/IridiaSupp2010-010/
64 Robotics applications
7. Conclusions
In this thesis we have proposed an automatic methodology to synthesize
robotic agent programs. This kind of techniques are characterized by two
main components: the agent program model and the optimisation algorithm.
In our methodology, the agent program model is given by a Boolean Network
(i.e., the BN) and the optimisation algorithm consists of a metaheuristic tech-
nique. Among metaheuristics, in this work we have adopted a stochastic local
search algorithm.
We have modeled the BN design as a constrained combinatorial optimi-
sation problem by properly defining the set of decision variables, constraints
and the objective function. Precisely, Boolean functions of nodes represent
the decision variables; they are initially randomly generated and the opti-
misation algorithm must determine their optimal values according to the
objective function.
The methodology has been first validated by experiments on abstract
case studies. Analysing BNs obtained by automatic design, we noticed their
tendency to the critical dynamic regime, that is the most interesting and
studied one, because researchers conjecture that also living beings are in a
critical state and refer to this concept as “life at the edge of chaos”. More-
over, critical regime efficiently copes with robustness and flexibility at the
same time.
After the validation, we applied the methodology on robotics case studies.
In order to acquire familiarity with this context, we experimented a simple
66 Conclusions
task (i.e., path following), in which the robot selects its actions basing only on
current sensory inputs. We obtained good performances in a short computa-
tion time, thus we decided to move on more complex scenarios. In particular,
we focused on a task (i.e., phototaxis and antiphototaxis) in which the robot
is required to keep a sort of internal memory to achieve a given goal. At
the end of the automatic design process, the obtained agent programs have
been ported into a real robotic platform and we undertook extensive exper-
iments in order to evaluate their effectiveness. The obtained results show
that BN dynamics is suitable to realise complex behaviours notwithstanding
the simplicity of the model. Moreover, results have been obtained in limited
computation time by a simple stochastic local search algorithm. However,
the goal of this thesis has been to provide a proof of concept, without fo-
cusing on statistic properties of the methodology, such as its success rate on
robotics case studies.
Future work will consist of analysing the BN obtained by automatic de-
sign in order to understand their dynamical properties and reuse them as
building blocks for future robotics applications, even more complex.
In order to improve the methodology, we plan to experiment with variants
of the adopted BN model (e.g., asynchronous dynamics and probabilistic up-
date rules). Moreover, both the number of BN nodes and BN topology could
be subject to automatic design as well as Boolean functions. Improvements
will concern also the optimisation algorithm: for example, we will develop
hybrid techniques with the aim of addressing advanced design goals. We
plan to pursue this research by decomposing the problem into interdepen-
dent sub-problems and solve each of them with the most suitable technique.
For example, BN design can be decomposed into two phases: in the first
phase, the BN architecture is defined (for instance by an Evolutionary Al-
gorithm or Ant Colony Optimisation) and in the second phase the functions
regulating the BN are defined (for instance, by means of an Iterated Local
Search).
A. The e-puck robot
The e-puck (Figure A.1) is a robot designed by Dr. Francesco Mondada and
Michael Bonani in 2006 at EPFL with the purpose to be both a research
and an educational tool in universities [37]. Both the hardware design and
the software libraries have been entirely developed as open-source projects;
free access to related documents has greatly increased its success and it has
stimulated researchers to develop their own extensions to enrich the robot
capabilities.
The core of the robot is a dsPIC processor, a low-energy microcontroller
produced by Microchip Technologies. In addition, Microchip provides a com-
plete toolchain based on the popular GCC compiler [49]. As shown in Table
A.1, a standard e-puck is equipped with several sensors and actuators.
Skilled readers should have noted the main bottlenecks of this system.
A slow CPU and a low quantity of RAM constraint the computational ca-
pabilities. For example, in the context of this thesis, we can port on the
e-puck Boolean Networks (i.e., the agent program) with a maximum number
of nodes (i.e., 216) and a maximum number of ingoing arcs into each node
(16). These limits become more explicit whether we compare the e-puck
with other robots commonly used in educational fields like, for example, the
Mindstorms NXT. In fact the Lego Mindstorms NXT provides a powerful
32-bit ARM7 microcontroller with 64 KB of RAM, eight times the amount
provided by the e-puck [33].
68 The e-puck robot
Figure A.1: The e-puck robot.
Dimensions 70 mm diameter, 55 mm height, 150 g
Battery autonomy 5Wh LiION providing about 3 hours autonomy
Processor dsPIC 30F6014A @ 60 Mhz (∼15 MIPS)
16 bit microcontroller with DSP core
Memory 8 KB RAM, 144 KB FLASH
Motors 2 stepper motors
Speed Max: 15 cm/s
IR sensors 8 infra-red sensors for ambient light
and proximity measuring
Camera VGA color camera with resolution of 480x640
Microphones 3 omni-directional microphones
Accelerometer 3D accelerometer along the X, Y and Z axis
LEDs 8 red LEDs on the ring, green LEDs in the body,
1 strong red LED in front
Speaker On-board speaker capable of WAV
and tone sound playback
Connections Serial port, Bluetooth
API language GNU C, C99 (partially)
Table A.1: e-puck features [9]
Bibliography
[1] E. Alba and R. Marti. Metaheuristic procedures for training neural net-
works. Springer-Verlag New York Inc, 2006.
[2] M. Aldana, E. Balleza, S.A. Kauffman, and O. Resendiz. Robustness
and evolvability in genetic regulatory networks. Journal of theoretical
biology, 245(3):433–448, 2007.
[3] U. Bastolla and G. Parisi. A numerical study of the critical line of
Kauffman networks. Journal of theoretical biology, 187(1):117–133, 1997.
[4] R.D. Beer and J.C. Gallagher. Evolving dynamical neural networks for
adaptive behavior. Adaptive behavior, 1(1):91, 1992.
[5] H. Bersini and V. Calenbuhr. Frustrated chaos in biological networks.
Journal of theoretical biology, 188:187–200, 1997.
[6] C. Blum, M.J.B. Aguilera, A. Roli, and M. Sampels. Hybrid Metaheuris-
tics: An Emerging Approach to Optimization. Studies In Computational
Intelligence, page 290, 2008.
[7] C. Blum and A. Roli. Metaheuristics in combinatorial optimiza-
tion: Overview and conceptual comparison. ACM Computing Surveys
(CSUR), 35(3):268–308, 2003.
[8] S. Braunewell and S. Bornholdt. Reliability of genetic networks is evolv-
able. Physical Review E, 77(6):60902, 2008.
70 BIBLIOGRAPHY
[9] CYBEROBOTICS. e-Puck specification, 2007.
[10] B. Derrida and G. Weisbuch. Evolution of overlaps between configu-
rations in random Boolean networks. Journal de physique, 47(8):1297–
1303, 1986.
[11] M. Dorigo. Learning by probabilistic boolean networks. In Proceedings
of World Congress on Computational Intelligence – IEEE International
Conference on Neural Networks, pages 887–891, Orlando, Florida, USA,
1994.
[12] J. Dreo. Metaheuristics for hard optimization: methods and case studies.
Springer Verlag, 2006.
[13] J.L. Elman. Finding structure in time. Connectionist Psychology: A
Text with Readings, 1999.
[14] A. Esmaeili and C. Jacob. Evolution of discrete gene regulatory models.
In Proceedings of the 10th annual conference on Genetic and evolution-
ary computation, pages 307–314. ACM, 2008.
[15] C. Fretter, A. Szejka, and B. Drossel. Perturbation propagation in ran-
dom and evolved boolean networks. May 2009.
[16] C. Fretter, A. Szejka, and B. Drossel. Perturbation propagation in ran-
dom and evolved Boolean networks. New Journal of Physics, 11:033005,
2009.
[17] C. Gershenson. Introduction to random boolean networks. CoRR,
nlin.AO/0408006, 2004.
[18] D.E. Goldberg. Genetic algorithms in search, optimization, and machine
learning. Addison-Wesley Professional, Upper Saddle River,NJ, USA,
1989.
[19] D. Graupe. Principles of artificial neural networks. World Scientific Pub
Co Inc, 2007.
BIBLIOGRAPHY 71
[20] I. Harvey, E.D. Paolo, R. Wood, M. Quinn, and E. Tuci. Evolutionary
robotics: A new scientific tool for studying cognition. Artificial Life,
11(1-2):79–98, 2005.
[21] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice
Hall, 1999.
[22] P.F. Hingston, L.C. Barone, and M. Zbigniew. Design by Evolution:
Advances in Evolutionary Design. Natural Computing Series, page 352,
2008.
[23] J.H. Holland. Adaptation in Natural and Artificial Systems: An Intro-
ductory Analysis with Applications to Biology, Control, and Artificial
Intelligence. The MIT Press, April 1992.
[24] H.H. Hoos and T. Stutzle. Towards a characterisation of the behaviour
of stochastic local search algorithms for SAT. Artificial Intelligence,
112(1-2):213–232, 1999.
[25] H.H. Hoos and T. Stutzle. Stochastic local search: Foundations and
applications. Morgan Kaufmann, 2005.
[26] F. Hutter, H.H. Hoos, K. Leyton-Brown, and T. Stutzle. ParamILS:
an automatic algorithm configuration framework. Journal of Artificial
Intelligence Research, 36(1):267–306, 2009.
[27] S.A. Kauffman. Metabolic stability and epigenesis in randomly con-
nected genetic nets. Journal of Theoretical Biology, 22:437–467, 1968.
[28] S.A. Kauffman. Requirements for evolvability in complex systems: or-
derly dynamics and frozen components. Physica D: Nonlinear Phenom-
ena, 42(1-3):135–152, 1990.
[29] S.A. Kauffman. Antichaos and adaptation. Scientific American,
265(2):78, 1991.
72 BIBLIOGRAPHY
[30] S.A. Kauffman. The Origins of Order: Self-Organization and Selection
in Evolution. Oxford University Press, USA, 1 edition, 6 1993.
[31] S.A. Kauffman. Investigations. Oxford University Press, USA, 2002.
[32] S.A. Kauffman and R.G. Smith. Adaptive automata based on Darwinian
selection. Physica D: Nonlinear Phenomena, 22(1-3):68–82, 1986.
[33] F. Klassner. A case study of lego mindstorms’TMsuitability for artifi-
cial intelligence and robotics courses at the college level. In SIGCSE
’02: Proceedings of the 33rd SIGCSE technical symposium on Computer
science education, pages 8–12, New York, NY, USA, 2002. ACM.
[34] C.G. Langton. Computation at the edge of chaos: phase transitions and
emergent computation. Phys. D, 42(1-3):12–37, 1990.
[35] N. Lemke, J. Mombach, and B.E.J. Bodmann. A numerical investigation
of adaptation in populations of random boolean networks. Physica A:
Statistical Mechanics and its Applications, 301(1-4):589–600, 2001.
[36] M. Mataric and D. Cliff. Challenges in evolving controllers for physical
robots. Robotics and autonomous systems, 19(1):67–84, 1997.
[37] F. Mondada, M. Bonani, X. Raemy, J. Pugh, C. Cianci, A. Klaptocz,
S. Magnenat, J. C. Zufferey, D. Floreano, and A. Martinoli. The e-
puck, a Robot Designed for Education in Engineering. In Proceedings
of the 9th Conference on Autonomous Robot Systems and Competitions,
volume 1, pages 59–65, Portugal, 2009. IPCB: Instituto Politecnico de
Castelo Branco.
[38] S. Montagna and A. Roli. Parameter Tuning of a Stochastic Biological
Simulator by Metaheuristics. AI* IA 2009: Emergent Perspectives in
Artificial Intelligence, pages 466–475, 2009.
BIBLIOGRAPHY 73
[39] G. Nicosia and E. Sciacca. Robust parameter identification for biological
circuit calibration. In 8th IEEE International Conference on BioInfor-
matics and BioEngineering, 2008. BIBE 2008, pages 1–6, 2008.
[40] S. Nolfi and D. Floreano. Evolutionary Robotics: The Biology, Intelli-
gence, and Technology of Self-Organizing Machines. Cambridge, MA:
MIT Press/Bradford Books, 2000.
[41] S. Patarnello and P. Carnevali. Learning networks of neuron with
Boolean logic. Europhysics Letters, 4(4):503–508, 1986.
[42] A. Roli, C. Arcaroli, M. Lazzarini, and S. Benedettini. Boolean Networks
Design by Genetic Algorithms.
[43] A. Roli and M. Milano. Magma: A multiagent architecture for meta-
heuristics. IEEE Trans. on Systems, Man and Cybernetics-Part B,
34:2004, 2002.
[44] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach.
Prentice-Hall, Englewood Cliffs, NJ, 2nd edition edition, 2003.
[45] R.V. Sole and B. Luque. Phase transitions and antichaos in generalized
Kauffman networks. Physics Letters A, 196(1-2):331–334, 1994.
[46] R.V. Sole, B. Luque, and S.A. Kauffman. Phase transitions in random
networks with multiple states. Working papers, Santa Fe Institute, 2000.
[47] S.H. Strogatz. Nonlinear dynamics and chaos: With applications to
physics, biology, chemistry, and engineering. Westview Pr, 2000.
[48] A. Szejka and B. Drossel. Evolution of canalizing Boolean networks. The
European Physical Journal B-Condensed Matter and Complex Systems,
56(4):373–380, 2007.
[49] Microchip Technology. dsPIC Language Tools Libraries, 2004.
74 BIBLIOGRAPHY
[50] L.F. Terrence. Feedforward neural network methodology. Springer Ver-
lag, 1999.
[51] Wikipedia. Garrincha — Wikipedia, the free encyclopedia, 2010. [On-
line; accessed 05-July-2010].
[52] Wikipedia. Zoolander — Wikipedia, the free encyclopedia, 2010. [On-
line; accessed 05-July-2010].
[53] L. Xu, F. Hutter, H.H. Hoos, and K. Leyton-Brown. SATzilla: portfolio-
based algorithm selection for SAT. Journal of Artificial Intelligence
Research, 32(1):565–606, 2008.
List of Figures
2.1 Agent representation . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Example of Boolean Network . . . . . . . . . . . . . . . . . . 12
3.2 Boolean Network dynamics . . . . . . . . . . . . . . . . . . . . 13
3.3 Relationship between homogeneity p and number of ingoing
arcs K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Methodology approach . . . . . . . . . . . . . . . . . . . . . . 26
5.1 Run length distribution of the first preliminary case study . . 31
5.2 Function f(t; γ) used in the second preliminary case study . . 32
5.3 Run length distribution of the second case study . . . . . . . . 33
5.4 Function f(t; γ) used in third and fourth preliminary case studies 34
5.5 Run length distribution of the third preliminary case study . . 35
5.6 Run length distribution of the fourth preliminary case study . 37
6.1 Path follower environment. . . . . . . . . . . . . . . . . . . . . 42
6.2 An example of successful path following on a circular path. . . 47
6.3 Phototaxis and antiphototaxis environment. . . . . . . . . . . 49
6.4 Agent light sensors exploited in the phototaxis and antipho-
totaxis case study . . . . . . . . . . . . . . . . . . . . . . . . . 51
76 LIST OF FIGURES
6.5 e-puck proximity sensors . . . . . . . . . . . . . . . . . . . . . 54
6.6 Results of the first training and the relative testing . . . . . . 57
6.7 Results of the second training attempt and the relative testing 60
6.8 Results of the third training attempt. . . . . . . . . . . . . . . 62
A.1 The e-puck robot. . . . . . . . . . . . . . . . . . . . . . . . . . 68
List of Tables
5.1 Derrida’s parameters for BNs initially ordered . . . . . . . . . 38
5.2 Derrida’s parameters for BNs initially critical . . . . . . . . . 38
5.3 Derrida’s parameters for BNs initially chaotic . . . . . . . . . 38
6.1 Sensor mapping for the path following task . . . . . . . . . . . 45
6.2 Actuator mapping for the path following task . . . . . . . . . 45
6.3 Light sensor mapping for the phototaxis and antiphototaxis
case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.4 Clap sensor mapping for the phototaxis and antiphototaxis
case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.5 Actuator mapping for the phototaxis and antiphototaxis case
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.6 e-puck IR proximity sensor mapping for the phototaxis and
antiphototaxis case study . . . . . . . . . . . . . . . . . . . . . 55
A.1 e-puck features . . . . . . . . . . . . . . . . . . . . . . . . . . 68