learning and evolution in hierarchical behavior-based systems amir massoud farahmand advisor: majid...
Post on 20-Dec-2015
220 views
TRANSCRIPT
![Page 1: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/1.jpg)
Learning and Evolution in Hierarchical Behavior-based Systems
Amir massoud Farahmand
Advisor:
Majid Nili Ahmadabadi
Co-advisors:
Caro Lucas – Babak N. Araabi
![Page 2: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/2.jpg)
University of Tehran - Dept. of ECE 2
Motivation
Machines (e.g. robots): from labs. to homes, factories, … .
Machines face: Unknown environment/body
[exact] Model of environment/body is not known
Non-stationary environment/body Changing environment (offices,
houses, streets, and almost everywhere)
Aging Designer may not know how to
benefit from every aspects of her agent/environment
![Page 3: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/3.jpg)
University of Tehran - Dept. of ECE 3
Motivation
Difficulty of the design processMachines see different thingsMachines interact differentlyThe designer is not a machine!
I know what I want!
Our goal: Automatic design of intelligent machines
![Page 4: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/4.jpg)
University of Tehran - Dept. of ECE 4
Research Specification
Goal: Automatic design of intelligent robots
Architecture: Hierarchical behavior-based architectures.
Objective performance measure is available (reinforcement signal) [Agent] Did I perform it correctly?! [Tutor] Yes/No! (or 0.3)
![Page 5: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/5.jpg)
University of Tehran - Dept. of ECE 5
Behavior-based Approach to AI
Behavior-based approach as a successful alternative for classical AI approachNo {Abstraction, Planning, Deduction, … }
Behavioral (activity) decompositionagainst functional decomposition
Behavior: Sensor->Action (Direct link between perception and action)
![Page 6: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/6.jpg)
University of Tehran - Dept. of ECE 6
Behavioral Decomposition
build maps
explore
avoid obstacles
locomote
manipulatethe world
sensors actuators
![Page 7: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/7.jpg)
University of Tehran - Dept. of ECE 7
Behavior-based Design
Robust not sensitive to failure of particular part of the
system no need for precise perception as there is no
modelling thereReactive: Fast response as there is no long route
from perception to action
No explicit representation
![Page 8: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/8.jpg)
University of Tehran - Dept. of ECE 8
?How should we
DESIGNa behavior-based system?!
![Page 9: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/9.jpg)
University of Tehran - Dept. of ECE 9
Behavior-based System Design Methodologies
Hand Design Common in almost everywhere. Complicated: may be even infeasible in complex problems Even if it is possible to find a working system, it is not
optimal probably. Evolution
Good solutions can be found Biologically feasible Time consuming Not fast in making new solutions
Learning Biologically feasible Learning is essential for life-time survival of the agent.
![Page 10: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/10.jpg)
University of Tehran - Dept. of ECE 10
Taxonomy of Design Methods
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 11: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/11.jpg)
University of Tehran - Dept. of ECE 11
Problem FormulationBehaviors
ii
ii
iiiii
ii
iii
SSM
AASS
SssMssS
AA
ASB
:
,
);(
Action No
n1,...,i :
![Page 12: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/12.jpg)
University of Tehran - Dept. of ECE 12
Problem FormulationPurely Parallel Subsumption Architecture (PPSSA)
layer) in the is indicates(that
][ T)()2()1(
thj
mindexindexindex
iBjindex(i):
n m ... B BBT
oidanceObstacleAvtionBallCollecWanderingT
•Different behaviors excites
•Higher behaviors can suppress lower ones.
•Controlling behavior
![Page 13: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/13.jpg)
University of Tehran - Dept. of ECE 13
Problem FormulationReinforcement Signal and the Agent’s Value Function
N
iirN
R1
1
)1( behaviors ofset and structure agent with the
)1( behaviors ofset and structure agent with the1
1
,...,niBTRE
,...,niBTrN
EV
i
i
N
ttT
•This function states the value of using a set of behaviors inan specific structure.•We want to maximize the agent’s value function
![Page 14: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/14.jpg)
University of Tehran - Dept. of ECE 14
Problem FormulationDesign as an Optimization
Structure Learning: Finding the best structure given a set of behaviors using learning
Behavior Learning: Finding the best behaviors given the structure using learning
Concurrent Behavior and Structure Learning
Behavior Evolution: Finding the best behaviors given structure using evolution
Behavior Evolution and Structure Learning
TBT
i VBTi,
** maxarg,
TT
VT maxarg*
TB
i VBi
maxarg*
TBT
i VBTi,
** maxarg,
TB
i VBi
maxarg*
![Page 15: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/15.jpg)
University of Tehran - Dept. of ECE 15
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 16: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/16.jpg)
University of Tehran - Dept. of ECE 16
Learning in Behavior-based Systems
There are a few researches on behavior-based learningMataric, Mahadevan, Maes, and ...
… but there is no deep investigation about it (specially mathematical formulation)!
And most of them incorporate flat architectures.
![Page 17: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/17.jpg)
University of Tehran - Dept. of ECE 17
Learning in Behavior-based Systems
We design: Structure (Hierarchy) Behavior
We Learn:Structure Learning
Organizing behaviors in the architecture using a behavior toolbox
Behavior Learning The correct mapping of each behavior
![Page 18: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/18.jpg)
University of Tehran - Dept. of ECE 18
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 19: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/19.jpg)
University of Tehran - Dept. of ECE 19
Structure Learning
manipulatethe world
build maps
explore
locomote
avoid obstacles
Behavior Toolbox
The agent wants to learn how to arrange these behaviors in order to get maximum reward from its environment (or tutor).
![Page 20: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/20.jpg)
University of Tehran - Dept. of ECE 20
Structure Learning
manipulatethe world
build maps
explore
locomote
avoid obstacles
Behavior Toolbox
![Page 21: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/21.jpg)
University of Tehran - Dept. of ECE 21
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox 1-explore becomes controlling behavior and suppress avoid obstacles
2-The agent hits a wall!
![Page 22: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/22.jpg)
University of Tehran - Dept. of ECE 22
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox Tutor (environment) gives explore a punishment for its being in that place of the structure.
![Page 23: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/23.jpg)
University of Tehran - Dept. of ECE 23
Structure Learning
manipulatethe world
build maps
explorelocomote
avoid obstacles
Behavior Toolbox“explore” is not a very good behavior for the highest position of the structure. So it is replaced by “avoid obstacles”.
![Page 24: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/24.jpg)
University of Tehran - Dept. of ECE 24
Structure LearningChallenging Issues
Representation: How should the agent represent knowledge gathered during learning? Sufficient (Concept space should be covered by Hypothesis
space) Generalization Capability Tractable (small Hypothesis space) Well-defined credit assignment
Hierarchical Credit Assignment: How should the agent assign credit to different behaviors and layers in its architecture? If the agent receives a reward/punishment, how should we
reward/punish the structure of the agent? Learning: How should the agent update its knowledge
when it receives reinforcement signal?
![Page 25: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/25.jpg)
University of Tehran - Dept. of ECE 25
Structure LearningOvercoming Challenging Issues
Our approach is defining a representation that allows decomposing the agent’s value function to simpler components.
Decomposing the behavior of a multi-agent system to simpler components may enhance our vision to the problem under investigation.
Structure can provide a lot of clues to us.
![Page 26: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/26.jpg)
University of Tehran - Dept. of ECE 26
Structure Learning
Structure Learning
Zero Order Representation First Order Representation
The value of each behavior in each layer
The value of order (higher/lower)of behaviors in the structure
![Page 27: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/27.jpg)
University of Tehran - Dept. of ECE 27
Structure Learning Zero Order Representation
avoid obstacles(0.8)
avoid obstacles(0.6)
explore(0.7)
explore(0.9)
locomote(0.4)Higher layer
Lower layer
ZO Value Table in the agent’s mind
locomote(0.4)
![Page 28: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/28.jpg)
University of Tehran - Dept. of ECE 28
Structure LearningZero Order Representation - Value Function Decomposition
g)controllin is (gcontrollin is |1
...
g)controllin is (gcontrollin is |1
g)controllin is (gcontrollin is |1
g"controllin is "...g"controllin is "1
g"controllin is "...g"controllin is "g"controllin is "1
1
22
11
111
121
1
mmt
t
t
N
tmt
N
tt
N
tmt
N
ttT
LPLrN
E
LPLrN
E
LPLrN
E
LrELrN
E
LLLrN
E
rN
EREV
![Page 29: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/29.jpg)
University of Tehran - Dept. of ECE 29
Structure LearningZero Order Representation - Value Function Decomposition
miVLBP
LBrN
ELBPLrN
E
n
jijij
n
jijtijit
,...,1 |
in behavior gcontrollin theis 1
|g]controllin is |1
[
1
1
m
i
n
jiijijT LPVLBPV
1 1
gcontrollin is |
Agent’s value function
ZO components
Layer’s value
![Page 30: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/30.jpg)
University of Tehran - Dept. of ECE 30
Structure LearningZero Order Representation - Value Function Decomposition
m
i
n
jiijij
TT
T
TT
LPVLBPVT
VT
1 1
*
*
gcontrollin is |maxargmaxarg
maxarg
![Page 31: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/31.jpg)
University of Tehran - Dept. of ECE 31
Structure LearningZero Order Representation - Credit Assignment and Value Updating
Controlling behavior is the only responsible behavior for the current reinforcement signal.
gcontrollin is |~
iijijij LPVLBPV
nijijnijnijnij rnLnBVVn
" step at time gcontrollin is "" step at time active is "~
1~
,,,1
![Page 32: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/32.jpg)
University of Tehran - Dept. of ECE 32
Structure LearningFirst Order Representation
![Page 33: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/33.jpg)
University of Tehran - Dept. of ECE 33
Structure LearningFirst Order Representation
m
iiindexkiindex
N
tt
N
ttT BPBr
NEr
NEV
1][
11
g)controllin is (gcontrollin is |1
]1
[
j
T
kjj
T
kj BBB
jkk
BBBj
kN
tt
k
N
tt
k
N
tt
VVB
Br
NE
BrN
E
BrN
E
;
0
;1
1
1
behavior activenext theis
and gcontrollin is 1
active is elsenobody and gcontrollin is 1
gcontrollin is |1
![Page 34: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/34.jpg)
University of Tehran - Dept. of ECE 34
Structure LearningFirst Order Representation
m
ii
i
jjindexiindexiindexT BPVVV
1
1
1)()(0)( g)controllin is (
![Page 35: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/35.jpg)
University of Tehran - Dept. of ECE 35
Structure LearningFirst Order Representation – Credit Assignment
If only one behavior becomes activated, we should update V0(i) . If two or more behaviors become active, we must update V(i>j) for which ‘i’ is the index of the controlling behavior and ‘j’ which is the index of the next active behavior .
![Page 36: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/36.jpg)
University of Tehran - Dept. of ECE 36
A Break!A Break!
![Page 37: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/37.jpg)
University of Tehran - Dept. of ECE 37
Introduction to Experiments
Abstract problemMulti-robot object
lifting problem I will only discuss
this problem now.
A group of robots lifts a bulky object.
![Page 38: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/38.jpg)
University of Tehran - Dept. of ECE 38
ExperimentsStructure Learning
0 5 10 15 20 25 30 35 40 45 50-50
0
50
100
150
Episode
Rew
ard
ZO
FO
Hand-designed structure
Random structure
Comparison of the average gained reward of two different structure learning methods (Zero Order (ZO) and First Order (FO)), hand-designed structure, and random structure for the object lifting problem.
![Page 39: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/39.jpg)
University of Tehran - Dept. of ECE 39
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 40: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/40.jpg)
University of Tehran - Dept. of ECE 40
Behavior Learning
No more behavior repertoire assumptionAll we know
Sensor/Actuator dimensionsReinforcement Signal
![Page 41: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/41.jpg)
University of Tehran - Dept. of ECE 41
Behavior LearningChallenging Issues
How should behaviors cooperative with each other to maximize the performance of the agent?
How should we assign credit to behaviors of the architecture?
How should each behavior update its knowledge?
![Page 42: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/42.jpg)
University of Tehran - Dept. of ECE 42
Behavior Learning
1. B2, B3, and B4 excite
2. B4 takes the control
3. Punishment!!!
?!
![Page 43: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/43.jpg)
University of Tehran - Dept. of ECE 43
Behavior Learning
Augmenting the action space with a pseudo-action named NoAction (NA)
NA does nothing and let lower behaviors take control
1. B2, B3, B4 excite
2. B4 proposed NA
3. B3 proposes an action and takes control
4. Reward!
![Page 44: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/44.jpg)
University of Tehran - Dept. of ECE 44
Behavior Learning
NA lets behaviors to cooperateHow should we force them to
cooperative correctly?!Hierarchical Credit Assignment Problem
Boolean-like algebra for logically expressible multi-agent systems
3121321 AAAAAAA
![Page 45: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/45.jpg)
University of Tehran - Dept. of ECE 45
Behavior Learning
unknown:
unknown:
unknown:
:)(
:
:
*
l
l
u
u
R
B
B
B
NAB
B
Ti
behaviorsupper
excitednot behavior gcontrollin
*
behaviorslower
1)(...)(1:1
NABNABBT
kuuR
![Page 46: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/46.jpg)
University of Tehran - Dept. of ECE 46
Behavior LearningOptimality
*
**
*
*
in excited is
" " ofon contributi by the achieved is Reward)()(
Ss
i
iSsiSsi
dsSsspsBpsR
SsBsREsREr
Internal states of different behaviors excites in different regions
![Page 47: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/47.jpg)
University of Tehran - Dept. of ECE 47
Behavior LearningOptimality
iii
Ss
iiiii
aBsBpsR
dsSsspaBsBpsRasQ
selects in excited is )(
selects in excited is ,
Ss
iii dsSsspNABsBpsRNAsQ selects in excited is ),(
iiiii AaasQNAsQ ),(),(
![Page 48: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/48.jpg)
University of Tehran - Dept. of ECE 48
Behavior LearningValue Updating
) selects and in behavior gcontrollin is (
)(),(,),(1, ,,1
iii
iiikiiiiiikiii
asB
srasasQasasQkk
)select and in excited are s andbehavior gcontrollin is and B;(
)(),(,),(1,
i
T
,,1
NAsBBBB
srNAsNAsQNAsNAsQ
jjijj
jikjjjjkjj kk
For the case of immediate reward
![Page 49: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/49.jpg)
University of Tehran - Dept. of ECE 49
Behavior LearningValue Updating
For the general return case, we should use Monte Carlo estimation.
Bootstrapping method is not applicable.
![Page 50: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/50.jpg)
University of Tehran - Dept. of ECE 50
Concurrent Behavior and Structure Learning
ApplyingBehavior Learning
State-Action MappingsStructure Learning
Hierarchy
![Page 51: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/51.jpg)
University of Tehran - Dept. of ECE 51
ExperimentsBehavior Learning
0 5 10 15 20 25 30 35 40 45 505
10
15
20
25
30
Episodes
Ave
rage
Gai
ned
Rew
ard
Str. Learning Beh./Str. LearningBeh. Learning
Reward comparison between structure learning, behavior learning, and concurrent behavior/structure learning methods for the object lifting task.
![Page 52: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/52.jpg)
University of Tehran - Dept. of ECE 52
ExperimentsBehavior Learning
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Average Gained Reward
Pro
babi
lity
Random Hand-designed
Str.Learning
Beh./Str.Learning
Beh. Learning
0 5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Average Gained Reward
Pro
babi
lity
Random
Beh./Str.Learning
Hand-designed
Beh. Learning
Str. Learning
Learning phase Testing phase
![Page 53: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/53.jpg)
University of Tehran - Dept. of ECE 53
ExperimentsBehavior Learning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 120
22
24
26
28
30
32
34
Percentile of the superior results
Ave
rage
Gai
ned
Rew
ard
Hand-designed
Str. Learning
Beh. Learning
Beh./Str. Learning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 122
24
26
28
30
32
34
Percentile of the superior results
Ave
rage
Gai
ned
Rew
ard
Beh./Str. Learning
Beh. Learning
Str. Learning
Hand-designed
Learning phase Testing phase
![Page 54: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/54.jpg)
University of Tehran - Dept. of ECE 54
ExperimentsBehavior Learning
A sample trajectory showing the position of robot-object contact points, the tilt angle of the object during object lifting, and controlling behavior of robots in each time steps after sufficient structure/behavior learning. Behaviors correspondence with numbers of lowest diagram is as follows: 0 (No Behavior), 1 (Push More), 2 (Don’t Go Fast), 3 (Stop), 4 (Hurry up), 5 (Slow down).
0 0.5 1 1.52
2.5
3
3.5
Time (sec)
Hei
ght
0 0.5 1 1.50
10
20
Time (sec)
Tilt
Ang
le
0 0.5 1 1.50
12
34
5
Time (sec)Con
trol
ling
Beh
avio
rs
robot 1
robot 2
robot 3
![Page 55: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/55.jpg)
University of Tehran - Dept. of ECE 55
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 56: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/56.jpg)
University of Tehran - Dept. of ECE 56
Behavior Co-evolutionMotivations
+ Learning can trap in local
maxima of objective function Learning is sensitive
(POMDP, non-Markov, …) Evolutionary methods have
more chance to find the global maximum of the objective function
Objective function may not be well-defined in robotics
- Evolutionary robotics’
methods are usually slow Fast changes of the
environment Non-modular controllers
Monolithic No reusability
![Page 57: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/57.jpg)
University of Tehran - Dept. of ECE 57
Behavior Co-evolutionMotivations
Use evolution to search the difficult and big part of parameters’ space Behaviors’ parameters space is usually the bigger one
Use learning to do fast responses Structure’s parameters space is usually the smaller
one A change is the structure results in different agent’s
behavior
Evolve behaviors separately (modularity and re-usability)
![Page 58: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/58.jpg)
University of Tehran - Dept. of ECE 58
Behavior Co-evolution
Agent
Behavior Pool 1
Behavior Pool 2
Behavior Pool n
Evolve each kind of behavior in its own genetic pool
![Page 59: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/59.jpg)
University of Tehran - Dept. of ECE 59
Behavior Co-evolutionFitness Sharing
Fitness of the agent Fitness of each behavior?!
Fitness SharingUniformValue-based
![Page 60: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/60.jpg)
University of Tehran - Dept. of ECE 62
Behavior Co-evolution
Each behavior’s genetic pool SelectionGenetic Operators
CrossoverMutation
Hard Replacement
Soft Perturbation
oldoldnew ki
ji
ji BXXBB
![Page 61: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/61.jpg)
University of Tehran - Dept. of ECE 63
Where?!
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 62: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/62.jpg)
University of Tehran - Dept. of ECE 64
Memetic Algorithm
We waste learned knowledge after each agent’s lifetime
Meme as a unit of information that reproduces itself as people exchange idea
Traditional memetic algorithms: Evolutionary Method: Meme exchange Local Search: Meme refinement
May be called as Hybrid Evolutionary Algorithm
![Page 63: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/63.jpg)
University of Tehran - Dept. of ECE 65
Memetic Algorithm
Two different interpretations of meme:Current hybridization of behavior co-
evolution and structure learningSimilar to traditional MADifference with traditional MA: different
parameters spaces are being searchedMeme as a cultural bias
![Page 64: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/64.jpg)
University of Tehran - Dept. of ECE 66
Memetic Algorithm
Experienced individuals store their experiences in the form of meme in the culture.
Newborn individuals get a new meme from the culture.
Structure as a meme
![Page 65: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/65.jpg)
University of Tehran - Dept. of ECE 67
Memetic Algorithm
Agent
Behavior Pool 1
Behavior Pool 2
Behavior Pool n
Meme Pool(Culture)
![Page 66: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/66.jpg)
University of Tehran - Dept. of ECE 68
Memetic Algorithm
Each meme has its own value
Value of the meme is updated using the fitness of the agent
Valuable memes have more chance to be selected for newborn individuals
iTi fT ,: *M
iiTTTT TBAAfffiniini
,: 11
![Page 67: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/67.jpg)
University of Tehran - Dept. of ECE 69
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Averaged last five episodes fitness comparison for different design methods: 1) evolution of behaviors (uniform fitness sharing) and learning structure (blue), 2) evolution of behaviors (valued-based fitness sharing) and learning structure (black), 3) hand-designed behaviors with learning structure (green), and 4) hand-designed behaviors and structure (red). Dotted line across the hand-designed cases (3 and 4) show one standard deviation region across the mean performance.
0 5 10 15 20 25 30 35 40 45 50-150
-100
-50
0
50
100
150
200
250
300
350
Generations
Fitn
ess
Structure Learning - Value-based Fitness Sharing
Structure Learning - Uniform Fitness Sharing
Hand-designed Behaviors and Structure
Hand-designed Behavior/Learning Structure
![Page 68: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/68.jpg)
University of Tehran - Dept. of ECE 70
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Averaged last five episodes and lifetime fitness comparison for uniform fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is much higher.
0 5 10 15 20 25 30 35 40 45 50-200
-150
-100
-50
0
50
100
150
200
250
300
Generations
Fitn
ess
and
Life
time
Fitn
ess
Structure Learning - No Meme Pool
Structure Learning - with Meme Pool
Hand-designed Structure/Behavior Evolution
Hand-designed Behaviors/Structure Learning
![Page 69: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/69.jpg)
University of Tehran - Dept. of ECE 71
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
(Object Lifting) Probability distribution comparison for uniform fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.
-300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 1
Meme (M) No Meme (N) Fixed Str. (F)
0 50 100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 5
100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 20
100 150 200 250 300 3500
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 50
F N M N M
F M N
N M
N
M
F
N M
N
M
M N
F
![Page 70: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/70.jpg)
University of Tehran - Dept. of ECE 72
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
0 5 10 15 20 25 30 35 40 45 50-200
-150
-100
-50
0
50
100
150
200
250
300
Generations
Fitn
ess
and
Life
time
Fitn
ess
Structure Learning - with Meme Pool
Structure Learning - No Meme Pool
Hand-designed Behaviors/Structure Learning
Hand-designed Structure/Behavior Evolution
(Object Lifting) Averaged last five episodes and lifetime fitness comparison for value-based fitness sharing co-evolutionary mechanism: 1) evolution of behaviors and learning structure (blue), 2) evolution of behaviors and learning structure benefiting from meme pool bias (black), 3) evolution of behaviors and hand-designed structure (magenta), 4) hand-designed behaviors and learning structure (green), and 5) hand-designed behaviors and structure (red). Filled line indicate the last five episodes of the agent’s lifetime and the dotted lines indicate the agent’s lifetime fitness. Although the final time performance of all cases are rather the same, the lifetime fitness of memetic-based design is higher.
![Page 71: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/71.jpg)
University of Tehran - Dept. of ECE 73
ExperimentsBehavior Co-evolution – Structure Learning – Memetic Algorithm
Figure 13. (Object Lifting) Probability distribution comparison for value-based fitness sharing (). Comparison is made between agents using meme pool as their initial bias for their structure learning (black), agents that learn structure from a random initial setting (blue), and agents with hand-designed structure (magenta). Dotted lines are for distribution for lifetime fitness. More right-side distribution indicates higher chance of generating very good agents.
-400 -300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 1
Meme (M) No Meme (N) Fixed Str. (F)
-400 -300 -200 -100 0 100 200 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 5
0 50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 20
0 50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Fitness
Pro
babi
lity
Generation 50
F
M
N
M N
F
N
M
F
M
N
F
N
M
![Page 72: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/72.jpg)
University of Tehran - Dept. of ECE 74
Other Topics
Probabilistic Analysis of PPSSAChange in the excitation probability
Change in the controlling probability of each layer.
Some estimate of learning timeThe effect of reinforcement signal
uncertainty onValue functionPolicy of the agent
![Page 73: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/73.jpg)
University of Tehran - Dept. of ECE 75
Conclusions
Behavior-based System Design
Learning Evolution
Structure (hierarchy) learning
Behavior learningCo-evolution of
behaviorsHybridization of
Evolution and Learning
Memetic Algorithm
![Page 74: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/74.jpg)
University of Tehran - Dept. of ECE 76
Contributions
Deep and mathematical investigation of behavior-based systems
Tackling the design process from different approaches Learning Evolution
Culture-based methods
Structure learning is quite new in hierarchical reinforcement learning
![Page 75: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/75.jpg)
University of Tehran - Dept. of ECE 77
Suggestions for the Future Work
Extending the proposed methods to more complex architectures
Automatic behaviors’ state space extraction Traditional clustering methods are not suitable
Convergence proof in learningAutomatic Abstraction of Knowledge
Simultaneous low-level and high-level decision making
Investigations on the reinforcement signal design
![Page 76: Learning and Evolution in Hierarchical Behavior-based Systems Amir massoud Farahmand Advisor: Majid Nili Ahmadabadi Co-advisors: Caro Lucas – Babak N](https://reader036.vdocument.in/reader036/viewer/2022081514/56649d4d5503460f94a2c70b/html5/thumbnails/76.jpg)
University of Tehran - Dept. of ECE 78
Thanks!Thanks!