a hybrid reactive and plan-based agent architecture for robotic … · 2014-03-13 · the...

The University of York Department of Computer Science

Submitted in part fulfilment for the degree of MEng.

A Hybrid Reactive and Plan-based Agent Architecturefor Robotic Soccer

Iain A. Wallace

May 2006

Supervisor: Daniel Kudenko

Number of words = 19,259, as counted by wc -w.This includes the body of the report, but not Appendix A.

Abstract

This project describes the design, implementation and evaluation of a novel hybrid control architecturefor teams of co-operative agents. The architecture combines a team-level automated planner and amapping between plan actions and sets of reactive control laws for individual agents. It is applied tothe domain of robotic soccer, and the implementation is simulator based. A series of experimentsserve to evaluate the new architecture against a baseline of purely reactive techniques, and show thatit offers increased performance through inter-agent co-ordination.

Contents

1 Introduction 9

2 Literature Review 112.1 Review of Reactive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Overview of Reactive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Techniques for Implementing Reactive Agents . . . . . . . . . . . . . . . . . . . 122.1.3 Reactive Control in RoboCup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.4 Advantages and Disadvantages of Reactive Control . . . . . . . . . . . . . . . . 17

2.2 Review of Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.1 Overview of Finite State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2 FSMs in RoboCup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.4 FSMs Within a Wider System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Review of Planning for Autonomous Agents . . . . . . . . . . . . . . . . . . . . . . . . 212.3.1 Overview of Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3.2 Planning in RoboCup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.3 Advantages and Disadvantages of Planning . . . . . . . . . . . . . . . . . . . . 25

2.4 Review of Layered Architectures for Autonomous Agents . . . . . . . . . . . . . . . . 262.4.1 Overview of Layered Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 262.4.2 Combination of Deliberation and Reaction in Layered Architectures . . . . . . 272.4.3 Applying A Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Hybrid Architecture Design 303.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Behavioural Layer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Description of Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3 High-Level Co-ordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.1 Finite State Machine based Co-ordination . . . . . . . . . . . . . . . . . . . . . . 383.3.2 The Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Implementation 454.1 General Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Implementation of Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.1 Problems Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Implementation of FSMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 Problems Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4 Planner Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.1 Selection of a Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.2 Implementation of Plan Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.3 Implementation of Plan to Behaviour Mapping . . . . . . . . . . . . . . . . . . 49

5

Contents

5 Experimental Evaluation 505.1 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Evaluating the Reactive Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Varying Numbers of Reactive Players . . . . . . . . . . . . . . . . . . . . . . . . 515.3 Evaluating the FSMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.3.1 Testing the Pass-and-Shoot FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3.2 Testing the Corner-Kick FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4 Evaluating the Low Abstraction Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.5 Evaluating the High Abstraction Planner . . . . . . . . . . . . . . . . . . . . . . . . . . 585.6 Summary and Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Project Evaluation 626.1 Time Management and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.4 Novel Aspects of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Project Conclusions and Further Work 657.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2.1 Continued Investigation of the Existing Solution . . . . . . . . . . . . . . . . . 657.2.2 Extensions to the Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A Plan Operators 71A.1 Low Abstraction Plan Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.2 High Abstraction Plan Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6

List of Figures

2.1 Setup of a line following robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 The three layered steering architecture. (diagram from [30]) . . . . . . . . . . . . . . . 132.3 A simple FSM for a football-playing agent . . . . . . . . . . . . . . . . . . . . . . . . . 182.4 FSM for a dribble behaviour. (diagram from [36]) . . . . . . . . . . . . . . . . . . . . . 192.5 FSMs for UWHuskies team. (diagram from [13]) . . . . . . . . . . . . . . . . . . . . . . 202.6 A Central Multi-agent planner (diagram from [24]) . . . . . . . . . . . . . . . . . . . . 252.7 Layers in the 3T Architecture (diagram from [10]) . . . . . . . . . . . . . . . . . . . . . 262.8 Layers in the CLARAty Architecture (diagram from [38]) . . . . . . . . . . . . . . . . . 282.9 Layers in the TouringMachines Architecture (diagram from [17]) . . . . . . . . . . . . 28

3.1 Overview of the Control Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Wander Don’t Collide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Wait For Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Defend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.5 Go to X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.6 Get Behind Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.7 Get Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.8 Kick to X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.9 Kick to free Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.10 Make Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.11 Reactive RoboCup Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.12 Pass and shoot FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.13 Corner Kick FSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.14 An example generated FSM for plan execution . . . . . . . . . . . . . . . . . . . . . . . 44

4.1 A screenshot of the simulator in action. . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2 UML Illustration of behaviour class hierarchy (Only a few behaviours are shown, for

clarity) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1 Reactive players against an empty pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.2 Reactive players against wandering opponents . . . . . . . . . . . . . . . . . . . . . . . 525.3 Reactive players against reactive opponents . . . . . . . . . . . . . . . . . . . . . . . . . 535.4 Pass-and-Shoot FSM against static opponents . . . . . . . . . . . . . . . . . . . . . . . 545.5 Pass-and-Shoot FSM against wandering and reactive opponents . . . . . . . . . . . . . 555.6 Goals conceded by Pass-and-Shoot FSM against reactive opponents . . . . . . . . . . 555.7 Mean time-to-score for Corner-Kick FSM . . . . . . . . . . . . . . . . . . . . . . . . . . 565.8 Mean time-to-score for Low Abstraction Planner . . . . . . . . . . . . . . . . . . . . . . 575.9 Mean time-to-score for High Abstraction Planner, Static Opponents . . . . . . . . . . 585.10 Mean time-to-score for High Abstraction Planner, Wandering Opponents . . . . . . . 595.11 Mean time-to-score and Goals Conceded for High Abstraction Planner, Reactive

Opponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7

List of Figures

5.12 Comparing Goals Conceded by Different Planners . . . . . . . . . . . . . . . . . . . . . 61

8

1 Introduction

Autonomous agents, and agent based computing in general, are increasing in popularity due to themany real world situations in which the behaviour of many individual “agents" - autonomous AI orotherwise - must be reasoned about and controlled. Examples of multi-agent systems (MAS) includescenarios such as online trading e.g. eBay, co-operating flocks of unmanned aerial vehicles (UAVs),teams of football playing robots and grid computing resources negotiating to provide a service.

The co-operative control of these systems poses many challenges, such as co-ordination between theagents, reactiveness to a changing environment, computational complexity of the solution, difficultyin representing the goals of agents, and the difficulties in ensuring useful, goal directed behaviourfor the agent system as a whole.

One domain used to conduct research into multi-agent control is the RoboCup project, which hasthe lofty goal of:

“By 2050, develop a team of fully autonomous humanoid robots that can win against thehuman world champion team in soccer."[33]

This presents several research problems, one of which is to develop a control architecture for a teamof agents. Football provides a relatively simple domain for testing multi-agent control architectures.There is a fixed, low, number of agents in play, the environment is a flat open pitch, the objective issimple - to score.

Behavioural control is a promising control method for agents which are required to be responsivein an unpredictable environment. In this approach, an agent’s actions are defined by a set of ruleswhich consist of percept to action mappings. Simple behavioural rules can be combined into ahierarchical structure which can lead to more sophisticated system capable of higher performance.

At the opposite end of the spectrum to reactive control is classical deliberative planning. Thisinvolves evaluating the current world state, and using search techniques to apply operators to thestate to manipulate the world into a desired goal state. The agent would then carry out the sequenceof actions - the plan. This differs to the reactive approach as the agent carries out actions with someregard to the future state of the world, rather than just the current state.

Both these approaches have their advantages and disadvantages for control of agents in a changingenvironment. This project investigates a novel multi-agent control architecture which aims to combinethese two approaches, to gain the benefits of both. This is achieved with a team-level planner anda mapping from plan actions to simple sets of behaviours for each robot on the team, leading to aco-ordinated team, capable of actions such as passes, yet low in execution complexity and resistantto an unpredictable opponent.

Firstly, a review of current approaches in reactive control, multi-agent planning and layeredarchitectures will set the scene for description of the architecture used and guide the techniques used

9

1 Introduction

for its implementation. The varying approaches to the architecture implementation will be described,and evaluated by testing experimental hypotheses. Finally these results will be used to evaluate thearchitecture, identify its strengths and weaknesses, and propose further work.

10

2 Literature Review

2.1 Review of Reactive Control

A key part of this project is the reactive control mechanisms used for governing the motion of theRoboCup agents. This chapter defines and introduces reactive control, and covers some of therelevant work in the field with a view to how it applies to this project.

2.1.1 Overview of Reactive Control

By reactive control, I mean to describe the types of agent whose actions are based solely on currentobservations, a reflex agent as described by Russel and Norvig [31]. These are typically the simplesttype of agent, as at their most basic they can be represented as a set of rules, each consisting of amapping between a precept and an action.

An example of this type of agent would be a simple line-following robot, with two sensorspositioned either side of a white line, and a motor controlling the wheels on either side, as shownbelow:

Left motor Right motor

Right SensorLeft Sensor

Figure 2.1: Setup of a line following robot

A simple set of reactive rules, mapping the percepts (in this case the sensors) to the actions (themotors) could be as follows:

Le f tSensor → RightMotor (2.1)

RightSensor → Le f tMotor (2.2)

These rules would mean that as the robot wandered off the line, it would steer to correct. This isonly an illustration of the simplest type of reactive agent, which is unsuitable for many domains,

11

2 Literature Review

as the action choice will only be made on the basis of what is currently observable. If the agent isoperating in a partially-observable world then its behaviour will not be very intelligent. In the caseof multi-agent systems, the world state may be only partially observable in terms of the environment,and also in terms of the goals and intentions of other agents.

2.1.2 Techniques for Implementing Reactive Agents

Subsumption Architectures

A key premise of reactive control systems is that they use many simple behaviours layered toproduce more complex behaviour. An early proponent of reactive controllers for robotic systems,Rodney Brooks, notes that "Complex (and useful) behaviour need not necessarily be a product ofan extremely complex control system."[12]. He proposes a layered subsumption architecture wherereactive behaviour such as obstacle avoidance is implemented at a low level, and then more advancedactions such as goal-seeking implemented on top of this, using the lower levels by modifying theirinputs and outputs. This allows for incremental development of a complex system from simplebuilding blocks, which gives many advantages (Brooks [12]):

• Multiple Goals A robot will often have many concurrent goals, such as collision avoidance andnavigating to a certain position. Layers of behaviours can be combined to account for more thanone concurrent goal. For example, Amor et al. [5] describe a system whereby a utility functionis used as an arbiter between the outputs of several behaviours directed towards different goals.

• Multiple Sensors The system must support multiple sensors, and make decisions in the face ofuncertain sensor readings. In the architecture described by Brooks, there is no need for anyform of sensor fusion, each behaviour is simply connected to sensors that are relevant to it.

• Robustness A mobile robot system must be robust to sensors failing, and the environmentchanging. The subsumption architecture achieves this in two ways. The many behaviours allowfor a degree of redundancy, and in the event of higher levels of reasoning failing, the controlcan fall back to more basic low level behaviours. Thus the robot may reduce in competence,but not completely fail.

• Additivity As more sensors are added to a robot, the architecture must cope with this increase.This is dealt with by additional layers and behaviours.

Steering Behaviours

Steering behaviours, as proposed by Reynolds [30] are a development of a reactive control systemwith several similarities to Brook’s subsumption architecture [12]. Amor et al. [5] summarise steeringas "the reactive, non-deliberative movement of physical agents". Steering behaviours are functionsthat return a desired steering vector based only on immediate local observations. Complex behaviouris produced through the combination of several simple ones.

Unlike the low level behaviours proposed by Brooks [12], steering behaviours do not actuallyproduce action or actuator commands as output, merely a desired vector. The model for control usedby Reynolds [30] is shown below in figure 2.2:

12


Figure 2.2: The three layered steering architecture. (diagram from [30])

The steering layer is only concerned with direction of movement of the agent, the actual locomotionis performed (through actuator controls) by a lower level, and the action selection occurs at a higherlevel, either using planning or other means. This is obviously not directly comparable to thesubsumption architecture - which is a general technique - as steering is specific to robot motion.However, for the purposes of this project where agents are playing football, the problem of whichdirection to steer the robot is the most important one, as they have limited other actions available tothem.

An example of how simple steering behaviours can be combined to form more complex emergentbehaviours is in the football playing behaviour exhibited by the demo provided by Michael Holm inthe OpenSteer library [2]. A combination of the below simple rules has the emergent property ofagents "playing" football.

1. Kick Ball if beside it.

2. Avoid Collisions.

3. If behind the ball, move towards it.

4. Move to "home" position.

Of course the above rules exhibit no co-operation, but they do illustrate how a behavioural systemmight play football using simple rules - in this case with a priority ordering.

While the advantages of the steering approach are plain to see - they are mostly the same as asubsumption architecture, the method is not without its limitations. As steering behaviours operateindependently they can conflict, and several combined behaviours may end up cancelling each otherout, or end up with a sub-optimal solution. Also there is the problem of oscillation between twodominant behaviours - for example a go-to behaviour trying to fit through a narrow gap, where itkeeps trying, then backing off due to conflict from obstacle avoidance, and the agent is then stuckin this loop forever. These problems are all a symptom of the behaviours lacking context - they areunaware of the situation in which they exist, so as well as interfering with each other, may affecthigher level actions also. This is a disadvantage not present in a subsumption architecture, as higherlevels can arbitrate and exhibit more rational behaviour. The outcome of this is that while steering isa useful approach - it is a simpler system to design than a full subsumption architecture - it is likelyto be ineffectual without the presence of higher level deliberative reasoning to control or arbitratebetween behaviours.

13

2 Literature Review

2.1.3 Reactive Control in RoboCup

Reactive control is widely used for RoboCup, as the environment is effectively static except for thepositions of the robots and ball. These things change position very rapidly and unpredictably, soattempting to reason about future states is of limited utility. Also the pitch is limited in size, and formost of the classes of robot in RoboCup, they find themselves in a fully observable environment.

A common notion in reactive RoboCup teams is that of roles. Roles are sets of behaviours assignedto particular agents such as "attacker", "defender" or "goalkeeper". Of course these designations arenot purely limited to reactive agents, but in many cases they refer to sets of reactive rules.

For example, Behnke and Rojas [7] describe two roles - "field player" and "goalie" and a simple"taxi" behaviour to move an agent across the pitch. Both use the same base function - "taxi" - butwith different parameters and different conditions for activation based upon the agent’s percepts.

Achieving co-ordination between reactive behaviours so that agents do not interfere with eachother is a difficult task - let alone co-ordinate to produces passes etc. The problem is that the worldstate is partially observable - the other agents behaviour is unknown. To counter this problem, areactive agent must be the sort described by Russel and Norvig [31] as a model-based reflex agent,which chooses behaviours not only based on current precepts, but also on some internal state, whichis itself updated by current precepts.

Another common theme in reactive systems in RoboCup agents is that of layering, and hierarchiesof behaviours - as some, such as steering, are of lower level than others, such as obstacle avoidance.

Examples of Reactive Rules and Roles in RoboCup

As mentioned above reactive control and roles are a common concept in RoboCup. A common roleis that of goalkeeper, and Lausen et al. [25] describe a set of reactive behaviours to carry out this role.

They use five basic behaviours to construct this role:

Go2Area Moves the robot to a specified location.

KickBall Kicks the ball.

InterceptBall The robot moves in a straight line infront of the goal, trying to stay between it and theball. This mode is used when the ball is close to the goal, and moving toward it.

SelfLocalize This behaviour is used for the robot to obtain its absolute position on the pitch throughobservation of landmarks.

FollowBall When the ball is a certain distance away from the goal, the robot keeps itself between theball and the goal, at a certain distance from the goal - effectively constrained to a semi-circlulararea.

Uther et al. [34] also describe a "defense" role, which is very similar to the goalkeeper role describedabove. However, it uses a simpler rule - stay on the point where the line through the ball and centreof goal area intersects the goal. This is very similar to the "InterceptBall" behaviour above, and

14


highlights how in the above example simpler behaviours are combined for greater performance inthe same role.

A second "Attack" behaviour is also covered by Uther et al. [34], and it serves to give an exampleof the lower-level control rules that may make up a particular role.

"If you are between the ball and the goal then walk to one side of the ball. If you are at90◦ to the goal then use the head kick to kick the ball sideways. If you would kick theball between the goalposts then kick. Otherwise, dribble the ball towards the goal."[34]

This can be seen to be constructed of lower level behaviours as used in the defense role - such askick, and walk - but with different parameters.

In general the roles and low level behaviours are the same between robocup teams - after allthey are all achieving the same thing with similar agents - it is the implementation that differs. Aparticular area of study is in the co-ordination between these roles - the above examples do notexplicitly consider other team mates.

Co-ordination Between Reactive Agents in RoboCup

A key part of the internal state of many reactive RoboCup agents is an agent’s role, and the roleplayed by other agents on the team, where a role is typically a set of reactive rules to follow. Oneexample of this kind of team is given by Veloso and Stone [35] in their description of the CMUnitedRoboCup team. The scheme used has both internal and external behaviours. The external behaviourscontrol the motion of the robot based on reactions to the world state. The internal behaviours alter theexternal behaviours parameters, based on observations of the world. A "locker room agreement"[35]contains the "plan" of roles played by each agent, and the triggers to change roles. Thus each agentcan make reactive decisions which are implicitly based on the behaviour of other robots, as theinternal behaviours will adjust the external ones used for control according to the team-wide roleassignments.

A disadvantage of this approach is that each agent must maintain a mapping between agents androles, and based on its own observations and the "locker room agreement" update this mapping whenagents change role, there is no communication between agents. This limits the level of co-ordinationthat can be achieved, and complex behaviours with many state changes are unsuitable, as agents willget out of sync.

Another role-based behavioural control system for robotic soccer is proposed by Behnke and Rojas[7]. The use of roles is for the same reason - to avoid conflict between agents - but the approachtaken is different. Instead of roles and the triggers to switch them being pre-defined, communicationis used to decide amongst the robots who should go for the ball (the closest robot) and who shouldperform which other roles. In the case that a robot fails to achieve its goal, this is communicated toother robots so that they may update their state. Although the behavioural control architecture usedby Behnke and Rojas [7] is different to that of Veloso and Stone [35] described above, the effect is thesame - a role affects the selection of behavioural rules used.

This approach obviously lacks the problems of agents getting out of sync, assuming that communi-cations are timely and reliable, which introduces another problem of its own.

15

2 Literature Review

A third approach to manage agent’s roles is illustrated by another CMU RoboCup team [11]. Herethe "roles" are the same as in the other two examples above - a collection of low-level reactive rules,but the roles are assigned to robots as set out in "plays". A play defines a role for each agent, and thepreconditions in which it can be applied, and a test to gauge when it is complete. Upon completionof one play, another is chosen.

This type of approach requires some form of central co-ordination mechanism, choosing the playand assigning the roles to all agents as appropriate, which means that the entire world state must befully observable to this central controller, or else it must use techniques other than reactive decisionmaking. This is acceptable for the RoboCup leagues where there can be a computer system incommunication with the agents, and also connected to an overhead camera on the pitch (rather thenall computation being carried out on-robot). As such, the problems associated with communicationbetween agents are avoided, as are problems with agents getting out of sync.

Reactive Hierarchies in RoboCup

As mentioned above, most reactive control used in RoboCup teams is as part of a hierarchy, to gainthe advantages that different types of hierarchy can provide (as described above in section 2.1.2).

The FU-Fighters team ([1], [7], [8]) architecture is based upon an extension of the dual-dynamicsarchitecture proposed by Jaeger and Christaller [23]. In dual dynamics, agents have "modes" whichare ranges for parameters to sets of reactive behaviours. Dual-dynamics is a formal method to specifythese modes in terms of what they do when active ("target dynamics") and when they are activated("activation dynamics"). The approach taken by the FU-Fighters team resembles a blend of both thisdual dynamics approach, and a layered subsumption architecture as described above (section 2.1.2).

In this case, the layers are temporal, with a few, simple, fast acting behaviours at the low level,and more behaviours, acting on more sensors and actuators - though at a slower rate - in the higherlevels. The comparison can be drawn with the subsumption architecture in that the higher levelbehaviours can adjust the activation dynamics of the lower level behaviours, in much the sameway that a subsumption architecture’s higher behaviours inhibit the lower levels. The goal of thisapproach is to allow a large number of behaviours, sensors and actuators at a high level, but withreduced computational cost due to the slower rate at which they operate. It also aims to avoid themain problem of a complex reactive system - that of the exponential increase in the complexity of thesolution as more and more behaviours are added due to their many possible interactions. The use ofthe dual-dynamics formalism allows for the reasoning about the behaviour of the system, and thelow rate of interaction at high levels reduces complexity.

The FU-Fighter’s approach is relatively unique amongst recent RoboCup teams in that it usespurely reactive rules for control, without a deliberative layer guiding a low-level reactive base:

"In our architecture deliberation is not implemented explicitly, but to an external viewerit seems to be present."[7]

A slightly different implementation of the same architecture is described by Lenser et al. [26],who make changes which have the primary goals of allowing for easier addition of new behaviours,removing existing ones and execution of non-confliciting behaviours in parallel. This is achievedthrough three main modifications:

16


• Behaviours have access to all levels of sensors, this allows behaviours to be inserted at any level.

• Behaviour activation is not dependant upon higher levels. This reduces interdependencebetween behaviours, allowing them to be swapped about easily.

• Non conflicting behaviours (that do not utilise the same resource) are allowed to execute inparallel.

The fact that these changes are all directed towards making the system easier to implement, testand change highlight the main weakness of a purely reactive system - the complexity of a usefulsolution. More recent moves in RoboCup are toward a a hybrid system, with the reactive layerforming only a small part of the overall system, as with the steering approach described above(section 2.1.2).

One such system is the RoboLog Koblenz RoboCup Simulator team [5], [6]. Their system is basedon the work of Reynolds [30], using the OpenSteer C++ library [2] to create a reactive control level,with a deliberative Prolog layer above it to set the goals. The novel aspect of their team is that theyattempt to overcome the main problem of steering behaviours - conflict between the outputs ofdiscrete behaviours - through a weighted utility function acting on all the behaviours. However, thisstill requires a large amount of heuristic testing to determine appropriate weights for each behaviour.

Another similar approach, although it does not use steering behaviours as defined by Reynolds, isthat of Pires et al. [29]. Here a two-level reactive structure is used, with the two layers defined as"Behaviour Selection" and "Control". The Control layer is analogous to the steering-type behaviours,it consists of simple primitive actions such as go-to and interfaces directly with actuators. On topof this, the behaviour selection layer consists of "Basic Logic Decision Units" (BLDUs) which selectthe appropriate behaviours. The name is, I feel, a bit misleading, as it implies some form of logical,deliberative process - this is not the case (though it is intended in their further work). Instead, aBLDU is a list of rules about the current world state, which if true cause certain control behavioursto be activated. These take as input also some internal state about co-operative decisions - as otherrobot’s internal state is unobservable - but this is still an example of a model-based reflex agent [31].

2.1.4 Advantages and Disadvantages of Reactive Control

The pros and cons of various reactive control techniques have been described above, but this sectionwill serve as a brief summary.

The main advantage of reactive control is that it is fast, and allows agents to react quickly to fastchanging environments. This is particularly important in the fast-paced RoboCup environment,and as such reactive control, at some level, is common in all RoboCup teams. Another advantage -most obvious in the subsumption architecture - is that a system can be built up in modules, witheach behaviour being extensively tested as it is developed. This allows for code re-use and aids inincremental development of a system.

The main disadvantage is of increasing complexity. As the system is required to exhibit more"intelligence" the number of behaviour rules and their interactions grows increasingly complex. Also,there are certain classes of goal directed problems a purely reactive agent cannot solve optimally. Togive an example from Russel and Norvig [31], a taxi may approach a junction, and no reactive rule

17

2 Literature Review

could determine the correct way to turn, only goal based reasoning about the final destination of thetaxi would produce the correct action.

2.2 Review of Finite State Machines

2.2.1 Overview of Finite State Machines

Finite State Machines (FSMs) are loosely defined as a set of states, and a transition function whichmaps inputs to state changes. More formal definitions may be found in [28] and [27], but this willsuffice for the purposes of discussion. Consider an agent that wishes to exhibit the behaviour ofdefending the goal when the opposition has the ball, attacking when it has the ball, or fetching theball if it’s free. These states, and the way in which percepts could trigger changes between them, canbe represented in a FSM, as shown below in figure 2.3:

Attacking

Defending Chase Ball

Free ballOpponenthas ball

Opponent has ball

Have BallHave Ball

Free BallOpponenthas ball

Figure 2.3: A simple FSM for a football-playing agent

All of the transitions are not present on the above diagram for clarity - only the one loop-back isincluded, on the "defending" state.

Effectively the above represents a state-based rule system - different control rules are useddependant on the state the agent is in. This can simplify an agents design and control, as only asubset of all possible control rules need be in effect at any one time, and rules need only consider thecase for the state in which they are active.

The states in a FSM can represent several things. A common approach (used in [36], Crismanet al. [13]) - and the most relevant in the context of this project - is for a state to consist of a smallset of reactive behaviours (as described in section 2.1). Alternatively a state may itself representanother lower-level FSM with its own states containing systems of rules. This type of FSM, wherethe system’s output depends on the current state, is of the general class known as Moore Machines.

The transitions in the FSM normally represent percepts. This means that changes in the observed

18

2.2 Review of Finite State Machines

world change the state in which the system operates. So for example, observing the opponent teamkick the ball up the pitch may trigger a change into a state representing a defensive rule set.

There are certain classes of input that FSMs are incapable of representing, including countingthings. These typically require a more complex representation involving a stack, but this is notimportant in this setting. As the FSM will only form part of the whole architecture these limitationscan be compensated for on other levels.

2.2.2 FSMs in RoboCup

FSMs are well suited to application in rule based reactive control systems (see section 2.1), as theyreduce the number of percepts that need be tested at any one time, and simplify agent control. As aresult many RoboCup teams using reactive control also utilise FSMs in some way to control motion.

One example is in the CMDash four-legged league team described by Veloso et al. [36]. Theynote the advantages of a FSM based system as providing an easy means to debug behaviours. Thetime spent in each state and state transitions can be recorded and in combination with methods todetect infinite loops and oscillations between states unwanted behaviour can be eradicated. Figure2.4 below shows the state machine used for the dribble behaviour. This shows a use of FSMs at alower level than given in the example above, and one can see how, as ever, agent control mechanismscould be constructed though layers of similar techniques.

Figure 2.4: FSM for a dribble behaviour. (diagram from [36])

The layering of FSMs is common in many reactive systems, as it is generally desirable to reducethe complexity of rules by restricting them to certain robot or world states. For example, Crismanet al. [13] descibe the layered FSM approach used by the UWHuskies team. The below diagramsillustrate the higher level FSM (a), which is of equivalent level of abstraction to the one above infigure 2.4, and the lower level FSM in (b).

Neither of these examples detail the implementation of the FSM, merely their design, and in doingso hide some of the issues. Code for one FSM is much like code for another, and they are simple towrite, however they are not simple to maintain - they still will involves many rules, and tests forstate transitions, which may require repetitive, hard to read code.

19

2 Literature Review

Figure 2.5: FSMs for UWHuskies team. (diagram from [13])

2.2.3 Advantages and Disadvantages

The disadvantage hinted at above, that FSMs can get over-complex for easy maintenance, is a majorproblem with them. Although they do simplify the rule writing process, the only real result is thatthey allow for a more competent agent behaviour to be implemented before things get out of control.Hugel et al. [22] acknowledge these problems, and present a "solution" by way of a tool to graphicallydraw FSMs and generate code, skipping the laborious and error prone hand-coding. However, thisdoesn’t avoid the problem that for a complex agent behaviour the FSM may have many states, manytransitions, and be hard to design and comprehend.

The advantages of FSMs in specifying the behaviour of simpler systems, with few states andtransitions is a valuable one. As such, whilst they may be inappropriate for modelling complexsystems, they are well suited to sub-systems. For example, in the context of this project, they wouldbe well suited to management of the small sets of reactive behaviours governing an agent at any onetime, or perhaps for very high level management of plan execution.

Another disadvantage of FSMs that this project aims to counter is their inherent inflexibility. Onceone is designed it cannot change, the number of states and transitions is fixed. By generating FSMsfrom plans the proposed architecture avoids this disadvantage whilst harnessing the suitability ofFSMs for high-level control.

2.2.4 FSMs Within a Wider System

For this project the interest in FSM lies in their applicability to high-level co-ordination. In particularthey can serve a role in conjunction with a planner (as described in the next section) to monitor plan

20

2.3 Review of Planning for Autonomous Agents

execution. In this role the states in the FSM would represent plan actions, and the transitions wouldbe tests for the post-conditons of actions.

As well as their use in plan execution, this project also describes a more basic system withouta planner, where the sole method of co-ordination is a FSM. This plays to their advantages, as iteffectively uses them to manage the set of reactive rules in operation on each agent on the team.


2.3.1 Overview of Planning

A planner in the context of AI and autonomous agents is a system designed to create strategies, orsequences of actions, for an agent to carry out to achieve some set of goals. A planner typically takesthree input components:

• World State: A simplified representation of the environment in which the agent operates.

• Actions: The set of all possible actions the agent may carry out, including their effects andpreconditions.

• Goals: The list of goals to achieve - the desired state of the world.

Each of these is represented in some formal language, such as STRIPS (the Stanford ResearchInstitute Problem Solver [31]). The planner can then use a set of inference rules to reason towardsgoal states from initial conditions (forward-chaining) or backwards from the desired goal states(backward-chaining).

For the purposes of this project the techniques by which planners are implemented is of littleinterest, rather it is how they can be applied and the classes of problems that they can solve which isof concern.

Planning on its own is rarely sufficient for an agent to perform, instead it it usually combined withlower-level reactive approaches (as described in section 2.1). In their excellent book on planning,Ghalib et al. [20] note that "Planning is the reasoning side of acting". It is this approach that is takenin this project - the planning component is used to provide reasoning about the actions to carry out.

There are different forms of planning, including motion planning, communication planning,perception planning and navigation planning. In the context of this project the aim for planning is toallow for some form of co-operation between agents, or control of a team.

There are several variants on classical planning, as described above, that may have some relevanceto the multi-agent robotic soccer domain. Several of these are described in more detail below.

Temporal Planning

Conventionally planners make the assumption that time is implicitly modelled as the sequence ofstates that are represented in the plan generated. In the case where plans represent actions for agents

21

2 Literature Review

in a real (or simulated) world, it may be useful to consider time, and concurrent actions, explicitly inthe planner. For example, it may be useful to create a plan where one robot moves up the pitch toreceive a pass whilst another fetches the ball.

This is quite relevant to the project, as the high level plans will have pre and post conditions, it maybe useful to model the time taken for an action to be carried out. However this may be impractical orunpredictable due to the non-deterministic nature of the environment, i.e. the other team’s actionscannot be known.

Planning under Uncertainty

Typically planning assumes the results of some action are deterministic, when this is often not thecase, especially where an uncooperative opponents is present, as in RoboCup. There are three basictypes of uncertainty which may be accounted for, as described by de Weerdt et al. [14]:

• Actions can have probabilistic effects - e.g. a movement may have a 90% chance of success anda 10% chance of being blocked. This requires a plan with branches covering sensing actionsand conditional tests, so-called contingent planning.

• These sensing actions may fail, or be unable to observe the world fully. This may mean theplanner cannot distinguish between world-states.

• The probability distribution over possible outcomes of a plan operator may not be known - thisnon-deterministic form of planning is known as conformant planning.

It is easy to see how this is useful for RoboCup - it is an uncertain environment - however itis not appropriate for this project. The aim here is to cope with uncertainty through the reactivebehavioural layer, and thus achieve the plan actions in spite of the uncertainty.

Planning with Utilities

Related to temporal planning are those approaches which attempt to maximise some utility functionor metric. For example, in RoboCup this could be a planner that aims to move the robots the minimaldistance to achieve the goal, or make the least passes.

As the level of abstraction for the planner in this project’s proposed architecture is quite high, it ishard to see what utility could be applied. For example, the distance moved will not be known, as theexact movement required is a function of the reactive layer, not the higher level planner.

Co-ordinated Planning

Perhaps the most relevant forms of planning for RoboCup would appear to be those for co-ordinatiingthe actions of many agents, each with individual goals. These methods tend to assume each agenthas its own goals and scheduler/planner, and provide a method for planning based on the actionsof other agents as well as an agents own actions. Examples of such methods include "Generalized

22


Partial Global Planning" as proposed by Decker and Lesser [15] which describes a set of co-ordinationmechanisms to enable agents to plan in co-operative teams.

This planning for co-ordination is the goal of the architecture presented in this project, however itaims to achieve this through a different method to conventional techniques. However, some methodsused elsewhere, particularly in the RoboCup domain, may still be relevant.

2.3.2 Planning in RoboCup

Planning is not as widely used in RoboCup as the reactive behaviours described in section 2.1. Thereare two main reasons for this - firstly the environment is generally non-deterministic as the actionsof the opposition team cannot be easily predicted, and secondly the world state changes quickly, andso there is little time to deliberate and plans may quickly fail and require repair or replanning.

Of those approaches that do use planning, it is generally used as a component of a layeredarchitecture with a lower reactive level. Other layered approaches use a high level deliberative layer,but this section only considers those which apply classical planning, as it is these approaches whichare of interest to this project.

There are two broad applications of planners to RoboCup, that of planning to deliberate aboutactions and the use of planning to ensure co-ordination between team members.

Planning for Action

The use of planners to reason about action which is then carried out by a reactive layer is a commonapproach in robotics in general. Fraser et al. [18] describe some of the problems with applying thisapproach to RoboCup and propose a solution. The main problem they identify is the robustness ofcontrol to unreliable sensor readings - small fluctuations in observed values from sensors could havea big effect in the behaviour of an agent if the fluctuations are close to a boundary value for a controlrule.

Their proposed solution maps sensor readings to qualitative predicates, such as "InReach(x)" tospecify an object is in range of the robot, regardless of its actual position. They then apply a classicalSTRIPS style planner to reason about the best action based on the current world state.

However, this introduces one of the main problems with planning - that of the level of abstractionused for the problem representation. As the work presented in [18] concerns itself mostly with themapping from the real world to qualitative predicates, they do not consider the complexity of theplanning problem which is directly linked to the level of abstraction chosen for the representation.

An example from Ghalib et al. [20] serves to illustrate the relationship between the representationand the complexity of the planning problem. Utilising a simple example with dock worker robotswho move containers about Ghalib et al. [20] note:

"suppose there are five locations, three piles of containers per location, three robotsand one hundred containers. Then Σ has about 10277 states, which is about 10190 times asmany states as even the largest estimates of the number of particles in the universe!"

23

2 Literature Review

Now even although most planners pose the problem such that exhaustive search of all states is notrequired, it is still clear that the simpler the representation the faster the planner will operate - whichmay be important in a real-time domain such as robotic soccer. This is taken into consideration byJensen and Veloso [24] who propose a method which strays from the more usual vertical layering ofreactive and deliberative control.

They identify the other main problem with planning for robot control which is that of timeliness.As the world changes quickly, and planning can take a long time, then planning is useless if ittakes too long, or cannot provide a control action in time. Instead of always having a deliberationstage which uses a reactive layer to carry out a plan, the approach taken is to wait for the result ofdeliberation to decide an action if there is time, or rely on reactive control if a decision is neededquickly. This however requires that the time to calculate a plan is at least roughly known, and so themain feature of their approach is:

"The key idea is that we discretize the state as a function of the average planning timerequired."[24]

This also solves the problem of what level of abstraction to use, as it is now governed solely by thetime for plan solving, but at the expense of two new problems. Firstly experimentation is requiredto identify the time plans take to solve, and if the sole criteria is planning time, then this has to bechosen so that enough time is given to allow useful plans to be generated, that provide a benefit overpure reactive control.

The approach in this project has no such constraints on the level of abstraction used by the planner,but it may be useful to introduce them, as the above cited work illustrates both the difficulty andimportance of choosing the correct level of abstraction.

Planning for Co-operation

Planners can also be used to ensure co-odination between members of a team by allocating individualgoals to achieve the team goal of scoring, or otherwise.

Within techniques for planning co-operation between RoboCup agents there are two generalapproaches - a single central planner for all robots and individual planners which communicate toco-operate. In some part the approach is governed by the class of robot the solution is designed for,but there is nothing to prevent either approach being applied to any league.

Jensen and Veloso [24] use the technique of a single central planner, which fuses the (possiblyincomplete) world state from each robot, then creates a multi-agent plan which is then decomposedinto a single-agent plan for each robot. This is illustrated in the below diagram, figure 2.6.

The main advantage of this is that it avoids problems in deciding when to communicate, what iscommunicated and ensuring that each agent knows their part to play in the plan. However it doesrequire a central agent to which the world (or at least the team) is fully observable, which may notbe possible in many domains. In the environment chosen for simulation in this project, this is nota problem - the world is fully observable. Also the aim of this project is not to study inter-robotcommunication, but propose a method to achieve collaboration, so a central planner is a legitimateapproach.

24


Figure 2.6: A Central Multi-agent planner (diagram from [24])

Individual agents collaborating to devise a group plan, as with the generalized partial globalplanning [15] mentioned above is a less attractive proposition for RoboCup, as it adds complexitywhere it is not needed - normally a central planner is feasible. However, Pires et al. [29] still proposethis for their future work, where they hope to have individuals propose team plans which are agreedon after negotiation between all the robots.

2.3.3 Advantages and Disadvantages of Planning

On the whole, planning in the classical sense is not widely used in RoboCup - mostly due to thecomplexity involved in creating worthwhile team plans that can be computed in time, and are robustto the non-deterministic soccer environment. As a result, most of the places where it would seemlogical to apply planning - such as in high level team plans - are currently hand-coded by humans,as in the "plays" described by Bowling et al. [11].

However there are foreseeable advantages to exploiting planning techniques. Automated plannerscan reduce the complexity of an agent, as instead of a finite state machine of great complexity tohandle every possible eventuality, it is only necessary to specify the actions available, and theirpreconditions, and then the appropriate plan is hopefully automatically generated - there is noneed for a person to consider every eventuality. This gives rise to another advantage of automatedplanning over hand-coded "human reasoning" - a planner may come up with previously unseen orunconsidered plans, which may be more efficient than those devised by humans.

This project aims to exploit these benefits, whilst also dealing with the disadvantages in a robustfashion. Tightly coupling the planner to reactive behaviours aims to give simple plans the flexibilityto deal with complex, changing situations. Keeping the plans simple and high level is an approachthat is shown to deal with the constraints of tight time bounds.

25

2 Literature Review

2.4 Review of Layered Architectures for Autonomous Agents

2.4.1 Overview of Layered Architectures

Layered architectures are common within the field of AI, and particularly autonomous systems.Typically they exist to allow for abstraction of the world to apply high-level reasoning algorithms towhat would otherwise be far too rich a data set from sensor input. Also in many domains this highlevel "world knowledge" is the result of the fusion of data from several different sources.

A brief discussion of layered architectures is relevant to this project, as it represents a layeredarchitecture of sorts. There are two immediately distinct layers - that of the planner, and that of thereactive control mechanism. This represents another main benefit of layered architectures, that ofallowing for the combination of both reactive and deliberative control (as discussed in sections 2.1and 2.3 respectively).

A simple example of a common structure of vertically layered architectures is that of the "3T"architecture, described by Bonasso et al. [10]. It is based on three layers, as shown below in figure 2.7.

Figure 2.7: Layers in the 3T Architecture (diagram from [10])

The three layers forming the 3T architecture are as follows:

Reactive Skills This is the lowest level of control, and represents reactive control of the sort coveredin section 2.1.

Sequencing This layer activates and deactivates sets of skills - it represents "sets of sequenced actions"[10].

Deliberation This layer contains the high level deliberative planner, which reasons about goals,resources and time constraints.

As can be seen from this description, 3T is primarily for use as a control architecture - ratherthan levels of abstraction for reasoning and perception. However it illustrates the nature of layeredarchitectures in simple form.

26


2.4.2 Combination of Deliberation and Reaction in Layered Architectures

For this project, which aims to combine deliberative reasoning (though use of a classical planner) andreactive control, it is useful to study how other control architectures manage this mapping betweenhigh and low level reasoning.

The main advantage of the three layer architecture of the type described above (and in more detailby Alami et al. [4]) is that it provides the necessary abstraction to make complex reasoning at a highlevel practical, in the "Deliberative" layer. This is noted by Gat [19]:

"The use of a sequencing layer makes it possible (in fact, easy) to use trivial anduninteresting algorithms to control real robots performing complex tasks." [19]

Typically this is achieved by placing the burden of what to do in the deliberative layer, and how todo it in the sequencing layer, with the lowest reactive level providing the means to actually controlthe robot. For example, a repair robot with several tasks to carry out may deliberate in what order toperform the repairs, based on their location and the likely time to complete them. The sequencerwould then handle the task of selecting which reactive skills were appropriate for each repair, andwhen to activate them.

This seems like a sensible method of organisation, and indeed has seen some success (a summaryis available in [19]), however it is not without its flaws. Firstly the boundaries are vague, and illdefined - e.g. if the goal is constant, then just a sequencing and reactive layer may be needed. Butthen in this case the sequencer may be called the deliberative layer, any reasoning in the reactive partcould be moved to sequencing, and so on. This introduces inconsistency between approaches, andalthough not vital in terms of implementation and the ability of a system to carry out a task, it is aweakness.

The CLARAty (Coupled Layer Autonomous Robot Architecture) architecture proposed by JPL([37],[38]) is another layered architecture proposed to solve this, and other perceived problems in thethree-level approach. The main issue that they address is that of access to the functional (reactive)layer from the deliberative planning layer. This is a weakness in the three-level schemes, as theplanner is separated from information on system functionality. As a consequence planners oftenhave their own, separate, system model in addition to that used by lower levels.

The proposed solution is to merge the planner (deliberative) and executive (sequencing) layers,complete with a common data base. Figure 2.8 below illustrates this structure.

The granularity dimension illustrates that a system could be composed of a detailed planner, withlots of functionality and therefore require little in the way of an executive, or vice-versa.

An alternative to the vertically layered architectures are the class of horizontally layered architec-tures ([39], section 1.4.4). In a horizontally layered architecture, the different level of abstraction areall connected to both input and output, and then some form of mediating framework governs whichis in control of the agent. This is in contrast to the vertical approaches described above, which onlyconnect the lowest, reactive, layer to the hardware or agent precepts/control mechanisms.

An example of a horizontal architecture are the TouringMachines described by Ferguson [17].Diagram 2.9 below illustrates the layers in such a system.

27

2 Literature Review

Figure 2.8: Layers in the CLARAty Architecture (diagram from [38])

Figure 2.9: Layers in the TouringMachines Architecture (diagram from [17])

The interaction between the reasoning of the planning layer and the reactive layer is not unlikethe structure of the subsumption architecture ([12], see section 2.1). If the planner wishes to controlthe reactive layer it does this by inhibiting or adding sensor input, and effector output through themediating framework.

The need for a mediating framework is one of the disadvantages of this approach - it is necessaryto consider all possible interactions between layers. This is also an issue with a vertical layering, butthe number of possible interactions between layers is reduced. The chief advantage in a horizontallayering lies in the simplicity, to add new types of behaviour it is necessary only to add another layer[39].

28


2.4.3 Applying A Layered Architecture

The approach in this project is not directly comparable to those described above. The reason for thisis mainly that where they were designed as a complete method for implementing an autonomousrobot in the real world, the approach taken here is implementing reasoning and control of a moregeneral "situated agent". As such, issues such as sensors and data fusion are not covered. However,there are similarities that can be drawn.

The general hierarchical approach taken by both vertical approaches - of low level reactive controlwhich a planner uses to achieve its goals - is a good one, and is used here. This enables modularity,easy testing and code re-use. Also the idea of mediating between competing behaviours to create acomposite output, as used in steering behaviours ([5],[30], see section 2.1 ) is similar to that of thehorizontal architectures.

The higher level deliberative reasoning is trickier to compare. The use of a classical plannerimplies that it fits in with the planner or deliberation layers in the architectures above. Howeverit could equally be argued that it represents merely a sequencer - the top level goal of "score" isunchanging. In this respect it is perhaps more appropriate to consider my approach to take thelessons that CLARAty learns from the conventional three-layer model, and the approach of thehorizontal architectures, and apply them. The planner interfaces directly with percepts to gaininformation about the world state, as well as the behaviours, to generate its plans - this fits well withwhat Volpe et al. [38] suggest is a necessary, but missing feature of the three-layer architecture. It isfurther from the horizontal architectures - as the planner does not interface directly for output, butthe notion that all levels of reasoning need direct access to the percepts is valid, and used here.

However, it would be erroneous to describe my approach as an instance of the CLARAty architec-ture, it merely draws ideas presented there (and in other places). For example, the structure of thecode, and interactions between hierarchies of control algorithms are discussed in [37] and [38], butnot considered here.

29

3 Hybrid Architecture Design

This chapter covers the design of the hybrid architecture presented in this project, and the justificationbehind it. The goal of the project was to combine reactive behaviours and deliberative planning in anovel way, in an attempt to gain the benefits of both, with few disadvantages. To summarise from theprevious review chapters, reactive control has the advantage of being fast and able to respond to achanging environment whereas deliberative planning has the potential to achieve team co-ordinationand more complex tactics.

RoboCup was chosen as a suitable domain to implement the proposed architecture due to the bodyof work available for reference specific to the domain, the fact that it represents a simple world forthe agents, and a team of agents can co-ordinate for better performance. Another possibility wouldhave been war-games or strategy games, where the technique could be used to control a squad ofunits. In part the choice was limited by freely available simulators, and RoboCup met this criteriaas there was an existing simulator written by Matthew Grounds - a member of the department -available to me.

3.1 Overview

At the simplest, highest level, this architecture consists of a team level planner which generates asequence of actions that each map to sets of reactive behaviours for one or more agents. As theplans are executed, the behaviours on each agent are changed to the appropriate set. This aims toreduce the complexity of the reactive control (as few behaviours are in effect at one time) and yetprovide complex team-level co-ordination, such as setting up passes, through a high level planner.The problem of planning in a dynamic non-deterministic world (due to the opposition) is handled bythe natural ability of the behaviours to react to the opponent’s moves.

Figure 3.1 shows a high level view of the system, with the planner mapping to a set of actions, andeach of those mapping to sets of behaviours on one or more agents. A point to note is that there area finite number of actions and behaviours, and it is merely different combinations of these that giverise to different team plays.

In some respects the entire system can be considered as a vertically layered architecture (Seechapter 2 section 2.1), however each agent is fairly simple, with only a small selection of behaviours.Also information about world state is accessed from all layers, not just through the bottom one - thisis another difference from conventional layered approaches.

The system bears some resemblance to the "team plays" approach taken by Bowling et al. [11]. Theytoo have a team plan that maps to sets of behaviours, however they hand code the plans, and selectfrom them based on a learning algorithm, whereas in this approach plan generation is automated.

Absent from this system design chapter are any test-cases, which may be an expected part of any

30

3.2 Behavioural Layer Design

Planner

Robot 1

Behaviour A

Behaviour B

Plan Execution

Robot 2

Behaviour B

Behaviour C

Robot N..

Behaviour A

Behaviour C

Action 1

Pre-condition

Post-condition

Action 2

Pre-condition

Post-condition

Action N...

Pre-condition

Post-condition

Figure 3.1: Overview of the Control Architecture

software engineering effort. The reason for this is in the difficulty of quantitatively testing the code -most of the testing can only be done with subjective observation within the simulator. As the goalof this project is to evaluate an architecture design, rather than the quality of a code artefact thisis not so important. In depth evaluation is carried out with the design and execution of severalexperiments, covered in depth in chapter 5. A detailed commentary on the software engineeringprocess used for this project is present in chapter 6, "Project Evaluation".


The reactive behaviours are solely responsible for motion of an agent at any particular time, and arenot required to be aware of the levels above them - they merely provide a service to be used. Thisdesign allows for easy testing, as their function is independent of other system components.

The basic reactive control method used is the steering approach presented by Reynolds [30], asreviewed in section 2.1.2. This is a particularly suitable approach for robotic soccer, as all the agentcan do is move or kick. As the output from a steering behaviour is a vector, they may be combinedin various ways, through summation or weighting, as in [5], or by using a set of rules to select thecorrect response based on current percepts.

The observations available to the reactive layer are the position and velocity of all robots, and theball. The behaviours are all purely rule based reasoning, acting on very few rules to be fast - theyare called once per simulation time-step. None of the behaviours utilise state - they all act only oncurrent inputs. This keeps them simple, fast and predictable.

31


3.2.1 Description of Behaviours

This section covers all the behaviours used in the system implemented for experimentation. Todevise the behaviours it was necessary to implement, I first considered the obvious ones, such as"GoTo" and "kick" and then added the others after designing the FSMs used for higher level control(see section 3.3.1). Some refinement of the behaviours was also required when I considered theplan operators, and the behaviour sets needed to implement those, this resulted in designing newbehaviours throughout the implementation process.

Several "atomic" behaviours are used as the building blocks for others, these are:

Wandering This is a non-parameterised function, which returns a vector perpendicular to the agentsdirection of travel, chosen randomly between left and right. This is added to the forwardmotion so that the agent appears to wheel around the pitch at random.

Seeking Returns a vector to steer toward a point. This is calculated as the difference between thecurrent velocity and the vector to the point being sought.

Constraint Steers to stay within an area. This takes as a parameter a box bounding the area toconstrain to, and predicts the agents position in a few time-steps using a linear trajectoryestimation. If this takes the agent outside the area then a counteracting steering force isreturned, otherwise a zero-vector is returned.

Avoidance Steers to avoid collisions with other agents or obstacles. This is parameterised by theobjects to avoid, and a number of timesteps to consider collisions for. The agents positionis lineraly extrapolated for the number of time-steps, and if it crosses any other agent’s, orintersects an obstacle then a vector is returned to steer away from it.

Pursuit Steers to intercept a moving target. The cross-product between the target velocity and theagents is computed to discern if their paths are parallel, divergent or convergent. The vectorreturned steers to ensure that they converge.

The other behaviours used are listed in turn below, many implement no new sensing or actionthemselves, but merely arbitrate between output from others.

32


Wander and Don’t Collide

This aims to steer for a random wander, without colliding with other robots. I chose to create thisbehaviour as it is useful for evaluating agents when there is an unpredictable environment (withrobots wandering around) but no malicious opponent attempting to go for the ball.

Is Avoidance vector > 0?

Start

Steer for wandering

Steer to avoid

Yes

UsesAvoidance

Wander

No

Figure 3.2: Wander Don’t Collide

Wait for ball

This behaviour causes the robot to wait in position, and move to intercept the ball if it will passwithin a specified radius. This gives suitable behaviour for defending, goal-keeping actions andreceiving passes. The reason for seeking the far side of the radius if the ball is passing though it, isto manoeuvre the robot behind the ball, rather than chase it as it passes through - as soon as it entersthe area the robot will head straight for it.

UsesSeekingStart

Robot in contact with

stopped ball?

No steering

Yes

Is Robot outwith waiting

radius?

Seek home position

Yes

Is ball in the waiting radius?

Seek ball

Will the ball's path carry it into the waiting radius?

Seek point on radius where the ball path leaves

it,

Seek home position

Yes Yes

No No No No

Figure 3.3: Wait For Ball

33


Defend

A behaviour that combines constrain and waiting behaviours to produce a defensive behaviour. Likethe wait for ball behaviour this takes a radius to defend as a parameter. If the ball enters this radius,the wait behaviour ensures that it will be intercepted. When the ball is trapped, then the defendingrobot attempts to clear it to any other robot waiting in free space. This was designed to allow theplanner based teams to explicitly allocate a robot for defence.

UsesConstraint

Waiting

Kick to Free Robot

Start

Is Constraint vector = 0?

Steer to stay on pitch

No

Close to the ball?

Steer to kick to free robot

Steer for waiting

No

Yes

Yes

Yes

Figure 3.4: Defend

Go to Location X

This steers the robot toward a location, not unlike the seek behaviour, but it also avoids collisions..This is the primary robot movement behaviour. As parameters it takes the point X to seek, and aprecision - how close to this point is "close enough".

UsesConstraintAvoidance

Start

Is Avoidance vector = 0?

Seek point X

Seeking

Yes

Is robot close to point X?

Yes

No steering

Steer to avoidNo No

Figure 3.5: Go to X

34


Get Behind Ball

This aims to steer the robot for a point touching the ball directly opposite point X. Mostly used byother higher level behaviours. This is necessary as a preparation to kick the ball to a point, and wasintroduced as a separate behaviour as it was envisaged as also being useful for other manoeuvressuch as defence.

Start

Calculate eqn. of line through robot

and ball

Is robot touching

ball?

Does the line pass within acceptable

distance of target X

No steering

Yes

Yes

No

No

Seek a point near the ball

opposite target X

UsesSeeking

Is the ball between robot

and target?Is avoidance vector = 0?

Steer to avoid ball

Avoidance

Yes Yes

No No

Figure 3.6: Get Behind Ball

Get Ball

This steers the robot towards the ball, constraining the robot to the pitch. It does not avoid collisions,as that would prevent the robot getting the ball off another. There are no parameters for this, and itwas designed as it is one of the obvious atomic behaviours required to play football.

UsesConstraint

Pursuit

Start


Yes

NoSteer to stay

on pitch

Steer to pursue ball

Figure 3.7: Get Ball

35


Kick to location X

This aims to steer the robot for a point touching the ball directly opposite point X, and then kickit. This is a relatively simple behaviour, as most of the work is done by the "get behind ball" sub-behaviour. As well as specifying the location to direct the kick, a second parameter is the accuracy ofthe kick. This allows for tuning the balance between setting up the kick - taking time - and gettingthe ball within a reasonable distance of the target.

UsesStart

Is Behind-ball steering Vector

= 0?

Yes

Is there a clear line to point X?

Yes

Kick the ball

Steer to the side

No

Seek Behind the ball

Get Behind ball

Seeking

No

Figure 3.8: Kick to X

Kick to free robot

This behaviour is used for situations such as corner kicks, where the goal is to pass the ball to a robotin free space. It uses the kick behaviour above, combined with a check for obstacles between therobot and team mates.

UsesKick to XStart

Clear line of sight to a team-

mate?

Steer to kick to them.

No Steering

Yes

No

Figure 3.9: Kick to free Robot

36


Make space

This behaviour is to make a robot "jostle" for space, it tries to keep a clear line between it and the ball.It is a combination of several lower level behaviours, and and important behaviour to have wherethere may be malicious opponents trying to block passes and the like.

UsesConstraint

WaitingStart


Steer to stay on pitch

Is Wait for ball vector =

0?Steer to

intercept ball

Wander

Is there a clear line to

ball?Steer random

wander

No Steering

No

No

No

Yes

Yes

Yes

Figure 3.10: Make Space

Reactive Player

The below diagram describes a complete RoboCup player, based purely on reactive behaviours, withno deliberation or co-ordination. The aim for this behaviour is to allow for testing of a team utilisingthe new architecture against a purely reactive approach, to investigate the benefits of co-ordination.The basic goal that the player is trying to achieve is to move to the ball such that kicking it will shootfor the opposition goal. The algorithm is based on that used in "Michael’s Simple Soccer" provided inthe OpenSteer library [2]. The performance of this reactive agent is tuned by altering the parametersused for the sub-behaviours. For example the degree of accuracy for kicks can be adjusted, tradingspeed of play against accuracy.

37


UsesAvoidanceSeeking

Start

Is robot touching

ball?

Kick the ball

Is Avoidance vector = 0?

Steer to avoid ballYes

No

No

Is ball between robot and

goal?

Seek point to intercept ball opposite the

goal.

Seek a point to the side of the

ball

NoYes

Yes

Figure 3.11: Reactive RoboCup Player

3.3 High-Level Co-ordination

This section covers the design of the high level co-ordination mechanisms. As an intermediate stepbetween a purely reactive team, and a team carrying out generated team plans, I also designedteam-level FSMs to implement set plays. The purpose of this is to allow for testing of the effects ofco-ordination and introduces state to the agents without the complexity of a planner. In addition,as generated plans must be mapped to FSMs for execution, this provides an incremental route to acomplete system, allowing for easier testing and debugging.

3.3.1 Finite State Machine based Co-ordination

The basic idea behind the team-level FSMs is simple. Each state has a set of behaviours for eachagent, and on each time step the post-condition of the current state is checked and if it is met thenthe state is changed to the next one, and the agent’s behaviours updated accordingly.

The Pass and Shoot FSM

This FSM was designed to give a plausible scenario for a pass between two robots, as setting up apass is one of the main benefits of co-ordination. The basic plan, as illustrated in the story boardbelow in figure 3.12, was for one robot to move up the pitch to receive a pass, then shoot for goal.Each frame of the story board represents a state in the FSM, and the agents are annotated with theirbehaviours for that state. The post-conditons for each state are detailed beneath it.

38


X

Goto XAvoid Collisions

Get Ball

Wait for Ball

Kick to X

Kick to X

Wait

Initial State Post-con: R1 at X R2 has Ball

Post-con: R1 has Ball Post-con: Goal

Figure 3.12: Pass and shoot FSM39


The Corner Kick FSM

Figure 3.13 below illustrates the states and behaviours for the corner kick set play. It is includedas there is no direct equivalent possible using the planner - after the corner kick is taken, it is notpossible to reason about which robot received the ball, and so plan subsequent moves. However, thecorner kick FSM itself forms one of the actions available to the planner, and so allows it to handleboth corners and throw ins, which require the same behaviour.

The basic plan is for the robots on the pitch to manoeuvre for space, then when one gets the ballthey attempt to shoot. The others then also attempt to shoot, but as they also avoid collisions it willprevent them from interfering with the robot with possession.

Post-con: Robot on pitch has ball

Make Space

Make Space

Make Space

Kick to Free Robot

Kick to XAvoid Collisions

Kick to X (goal)



Post-con: Goal

Figure 3.13: Corner Kick FSM

3.3.2 The Planner

The main design goal for the planner was to create something that took the current world state,generated and executed a plan. As this was the last part of the system to be designed, and I knewdesigning and testing possible plan operators would be a time-consuming process, the aim was alsoto keep the design simple. For this reason, I chose to use classical STRIPS-style planning, with noneof the extensions discussed in chapter 2 section 2.3 (such as temporal planning).

As some actions can be executed concurrently, but a temporal planner is not used, there is an extra

40


step required in post-processing a plan when it is mapped to a FSM for execution. That the plannerdoes not explicitly model uncertainty is not an issue, as the lower level reactive control should see tothe completion of plan actions even in the face of a hostile opponents. This is a great advantage of theapproach presented in this project, as planning under uncertainty is a difficult and time consumingtask. Conventional co-ordinated planning methods do not apply here, as a principle of the designis to keep individual agents simple, they do not have their own planners, and so a single, simple,global planner should suffice.

As has been mentioned in the review of planning techniques, a major problem is of the level ofabstraction to plan at. In the context of RoboCup this could correspond to representing positions onthe pitch in various ways, e.g. varying resolution grids or robot’s positions in relative terms. To testthis out, two similar sets of plan actions were created, but varying in level of abstraction.

The representation of world state is closely tied to the level of abstraction in general. Certainly it isuseful to consider the positions of opponent robots, but this will only ever be an initial position, asthe robots are capable of moving very fast relative to the pitch size.

In the time available, I designed and tested two sets of plan operators, these are described in thefollowing sections.

High Abstraction Planner Actions

The high abstraction representation splits the pitch into quarters. The information passed to theplanner is:

• Which quarter each robot is in

• Which quarter in the opponent’s half has the least opponents in it

• Which quarter the ball is in

• Which robot is in possession of the ball.

The mapping from the simulator to a representation for the planner allows for more precisereasoning than the planner is capable of, so it is possible to test for a robot being in possession evenalthough the planner may be aware of several robots being in the same quarter as the ball. Finally, ifthe ball is out of play, then this is recorded too. This is necessary so the planner can take appropriateaction in the case of throw-ins or corner kicks.

The goal for the planner is to score, and have some robots defending. This should allow a hybridteam to perform better than a purely behavioural one, as it explicitly coordinates the team fordefence.

The operators were chosen based on considering the moves that a team would have to make toscore a goal - and also considering on what level it is feasible to reason about future states. Theproblem is that although the reactive control allows the robots to deal with non-determinism, it alsoeffectively creates non-determinism itself. However, by mapping the parameters to plan actions tobehaviour sets, restrictions can be imposed to make the behaviour predictable to the level that isrequired, i.e. at least the quarter of the pitch the robot ends up in is known. The plan operators usedare covered informally below, for formal definitions see the actual PDDL specification in appendix A.

41


Pass:Robot1, Robot2Pre-Conditions: Robot1 has possession, Robot2 is waiting,Post-Conditions: Robot2 has possession

Go and Wait:Robot1, Position XPre-Conditions: Robot1 at position YPost-Conditions: Robot1 at position X, Robot1 is waiting

Shoot:Robot1Pre-Conditions: Robot1 has possession, Robot1 is in opponent’s halfPost-Conditions: Goal

Solo-Shoot:Robot1Pre-Conditions: Robot1 has possession, Ball is in quarter of pitch with least opponentsPost-Conditions: Goal

Get Ball:Robot1Pre-Conditions: No team-mate has possessionPost-Conditions: Robot1 has possession

Defend:Robot1Pre-Conditions: A team-mate is already the goalkeeperPost-Conditions: Robot1 defending

Goalkeeper:Robot1Pre-Conditions: No goalkeeperPost-Conditions: Robot1 defending, Robot1 goalkeeper

Pass to Free Robot:All robotsPre-Conditions: Ball out of playPost-Conditions: Goal

Some of the operators used, and conditions, appear a bit odd. This is because as well as allowingthe planner to create good plans, it must also be actively prevented from generating bad plans. Forexample, without the condition that "no team-mate has possession" a perfectly valid plan wouldinvolve one robot attempting to take the ball off a team-mate - which is clearly not a good idea.

An operator that illustrates the problems with non-determinism is the Solo-Shoot operator. At firstglance this appears to be the same as shoot, but the location based condition is on the ball not the

42


robot. The reason for this is that a "Get Ball" action prevents any further reasoning about the robot’sposition, as it could end up anywhere chasing the ball. By adding a shoot action not dependant onthe robot’s position, it is possible for one robot to score on its own, rather than rely on gaining theball as a result of a pass.

Lower Abstraction Planner Actions

The main difference between the low-abstraction and high-abstraction set of plan operators is in theworld state description. Where the higher level scheme divided the pitch 2x2, the low-level schemesubdivides 13x9.

The plan operators used are mostly the same as for the high-abstraction planner (see appendix A).The main difference is that work done in the mapping from environment to state can now be doneby the planner. For example, instead of requiring the mapping process identify the exact location ofa robot being passed to, it need only pass to the centre of the cell defined by the row and columnspecified by the planner.

Plan Execution

Plans are executed by constructing a FSM from the operators, and combining them to execute inparallel where possible. This involves mapping several plan actions to individual steps where theyinvolve different robots. Any action that requires more than one robot is mapped to its own state.Some actions that involve a robot changing state (such as the "Go and Wait" action) may have severalstates themselves, and so changes state on the robot themselves before the overall state machineadvances state.

As each action has a post-condition test associated with it, during execution the state of the FSM isadvanced once these are all met.

For example, consider the plan representing the "Pass and Shoot" FSM:

1. Robot 1 Get Ball

2. Robot 2 Go and wait at X

3. Robot 1 Pass to X

4. Robot 2 Shoot

This is a valid plan in either of the levels of plan abstraction - it would differ only in the parametersused for position X. The rules to create a parallel FSM to execute the plan are:

1. If an action involves a different robot, it may occur in the same state.

2. If an action involves more than one robot, it must have its own state.

Applying these rules to the above plan gives the FSM below, in figure 3.14. The transitionsrepresent the post-conditions that must be met to advance the state.

43


Robot1 Get Ball

Robot2 Go & Wait at X

Robot1 Pass to

XRobot2 Shoot

Robot1 has possessionRobot2 waiting at X Robot2 has possession

GoalEnd

Figure 3.14: An example generated FSM for plan execution

44

4 Implementation

This chapter presents an overview of the choices made for implementation of the design, and coverssome of the problems that came to light during the process. The process used for implementationwas of incremental development and testing. The design of the system was such that each layer -behavioural, FSM and the planner - could be written and tested independently before progressing tothe next.

The implementation is a large body of code, comprising over 10,000 lines for the simulator,architectural framework and control code, not counting the physics library. However, a lot of thiscode is comprised of fairly similar structures, e.g. the behaviours all follow a similar pattern -although not similar enough to generalise and gain the benefits of code re-use.

4.1 General Implementation Considerations

From the start of the project the intention was always to use a simulator for implementation andevaluation of the architecture. This avoids the problems with debugging and availability of actualrobots, and gives greater freedom for experimentation. The simulator chosen was one written bya PhD student in the department, Matthew Grounds, to support a single-robot machine learningexperiment. This meant that the source code was available to me, and so I was able to extend it tosupport multiple robots, and the instrumentation I required for evaluation.

The simulator is written in C++, using OpenGL to provide a graphical interface and utilises theOpen Dynamics Engine (ODE, [32]) to provide a physics simulation for modelling the robot and ballmotion.

The language chosen to implement the project was C++, as the simulator was written in it, andI am experienced in its use through my placement year. Apart from this, it is a good choice as itallows for object-orientated programming techniques which are well suited to the problems posedhere - for example individual behaviours can be represented by different objects all of the same class.This allows for easy code re-use and the benefits it provides - such as allowing modules to be testedonce and then reused, saving time.

4.2 Implementation of Behaviours

As has been mentioned previously, the low level steering behaviours utilise the OpenSteer library [2],which has the additional advantage of being written in C++. However this on its own is not enoughto meet the requirements of the architecture. Behaviours must be changed dynamically at run time,and may consist of anything from simple atomic behaviours to combinations of several behaviours

45

4 Implementation

Figure 4.1: A screenshot of the simulator in action.

to create a complete reactive player. In addition, it was anticipated a great deal of experimentationwould be required to fine tune behaviours, and test different arbiters for combining them.

Figure 4.2 below shows the class hierarchy that meets all of the above criteria. The key feature isthat a behaviour is not part of a robot, the robot class merely holds a reference to the current behaviour- so it can be dynamically changed. In addition, a robot holds a reference to a BehaviourBase class,and all different behaviours are subclasses of this, so that they may be used interchangeably. Bydefining the interface in this way, it also allows modification of any behaviour, even if it’s used byothers, without re-coding the dependants. For a robot to use multiple behaviours, an arbiter is codedas a subclass of BehaviourBase and presents the same interface to the robot as a single behaviourwould.

BehaviourBase

Vector::GetVelocity();

Robot

Constrain AvoidanceReactivePlayer

Figure 4.2: UML Illustration of behaviour class hierarchy (Only a few behaviours are shown, forclarity)

46

4.3 Implementation of FSMs

4.2.1 Problems Encountered

The chief difficulty in implementing the reactive behaviours was in tuning the parameters. Forexample, the speed of a robot’s kick affects its accuracy - too fast and it is inaccurate, too slow andthe robot will not be able to kick usefully far. For these tuning problems the graphical interface tothe simulator proved useful, as I could set up simple scenarios with single robots running only thebehaviour being tested, and observe their performance, and what went wrong. Also I accepted thatwithin the time-scale of the project I was never going to get things tuned perfectly, and nor did itmatter - as reactive control is not the main paradigm under test here.

There were also several challenges in creating a purely reactive soccer agent. My initial approachwas to take an existing behavioural control scheme for robotic soccer and implement it in mysimulator (and indeed my final solution was loosely based on one). This posed problems however,as reactive rules that worked well for a particular simulator or robot would perform badly in myenvironment - the rules need to be tuned appropriately. Also many existing RoboCup team’sapproaches would be unfeasible in the scope of the project - for example, the FU Fighters team use acompletely reactive approach, and their reactive control laws alone are 86,000 lines of code [16]!

Another problem that emerged was that of collision avoidance. Too conservative a setting wouldresult in robots that never got the ball, due to being too "scared" of collisions with others, while beingtoo careless would result in robots becoming "locked" together. This problem was never adequatelysolved for the purely behavioural agents, however it disappears with co-ordination, as only one robotwill be tasked to retrieve the ball.

4.3 Implementation of FSMs

The implementation of the FSMs was a relatively simple task. As the behaviours are able to beswapped dynamically, all the FSM object need do is keep track of state, and change them appropriately.To enable easy testing, the system was constructed so that a simulation has a generic PlannerBaseobject, and the various FSMs or planners can be defined as subclasses of this, providing the sameinterface. To allow planners access to system state, they are able to call the perception methods ofthe robot classes, which allow access to the positions of all robots, and the ball.

There are two main techniques that may be employed to implement a FSM. One way is to storethe state in a variable, and then the FSM is created as a large case-statement which switches basedon this variable. Another more sophisticated method is to have a class of objects that represents astate in the machine, and construct the machine from these objects. As the FSMs designed were onlysimple test cases, designed to test the principles of co-ordination, the first method was used, as itis simple and quick to implement. This is in contrast to the planner implementation which uses avariant on the second technique to dynamically create FSMs from plans.

4.3.1 Problems Encountered

The FSM code itself is simple, and due to the process of iterative design, implementation and testingfollowed, the performance of the behaviours was well known and could be relied upon. Thus theonly challenge was in the tests for the post-conditons of states. The issue is that completion of an

47

4 Implementation

action, e.g. "Go To X" , is not precise - the robot may not get exactly to location X for a variety ofreasons. To accommodate this the tests need to be flexible, and succeed when the conditions are goodenough. In the go-to case, this could be getting within a robot’s radius of the position.

As with tuning the parameters for the behaviours, this process was one of trial and error, andbased on observation of the simulation running.

4.4 Planner Implementation

There were two key stages to the implementation of the planner; creating the actions that map planoperators to behaviour sets and their post-condtions, and the planner itself, including the choiceof operators. By keeping the mapping separate from the plan generation it allowed me to evaluatedifferent planner implementations independently.

Although the design of the plan operators used is covered separately in this report (in section 3.3.2)during implementation the design was revised based on testing. This section also covers the issuesfaced with implementation of the chosen plan operators.

4.4.1 Selection of a Planner

As my requirements for a planner were fairly standard, I chose to use an existing planner imple-mentation and interface with it, as this would provide better performance in a shorter time thanwriting my own from scratch. In the interests of standardising planning research, many availableimplementations use the PDDL[3] file-format for input and output of the planner. As planningis a high-level control process in this architecture, it does not happen very often, so interfacingwith an external planner via files is feasible. This provides many advantages, other than speed ofimplementation. A major advantage is being able to swap out different planner implementations withminimal code changes, and also it allows for stand-alone testing of plan operators and inspection ofthe plans produced independently of the simulation environment.

The implementation I chose to test first was GraphPlan[9]. However, this proved to be very slow toexecute - often failing to find plans after tens of minutes. It was not a requirement that the planneroperate fast enough for real-time execution, as the simulator could be paused, but for practicalexperimentation a fast solution was required. The chosen planner was Fast-Forward by Hoffman[21]. This worked much faster, often producing a plan within a fraction of a second for the sameinput files.

4.4.2 Implementation of Plan Operators

The choice of Fast-Forward as a planner required some careful consideration of plan operators toperform efficiently. The main issue is that it heavily optimises the search for a plan by first onlyconsidering the add-list of operators. The add-list is the set of predicates added after a plan operator.This can cause issues, as many of the operators perform vital actions in their delete list. For example,it is a crucial fact that when a robot goes to a location it is not at its previous location. When the

48

4.4 Planner Implementation

delete set becomes more important than the add-set this efficiency assumption - that the solution islikely to be found in the set of plans formed only from add-sets - is destroyed.

The main problem caused by this was not just performance, but that "silly" plans were beinggenerated. For example without considering the delete-list at first, then the following plan is perfectlyvalid, and the preferred solution:

1. GoTo X

2. GetBall

The issue here is that a post-condition of GetBall is that the robots position is no longer known,and the sole thing in the add-list of GoTo is an updated position, so the GoTo move is useless.Also, this longer plan leads to a far longer search time. The solution I found was to replace theGoTo and the Wait operators with a single GoAndWait operator, and adding a pre-condition toGetball that the robot is not waiting. This prevents the above behaviour without depending ondelete-list post-conditions. At first it appears overly restrictive, but there is no obvious sensible planthat requires a GoTo not to be followed by waiting, as any other action involving movement has it"built-in".

A further efficiency related issue was the number of parameters for plan operators, specifically thenumber of possible values they could hold. As the planner searches by instantiating possible valuesfor the parameters following some heuristic, the lower the number of possible values the faster it willbe. This means that care must be taken in reducing the number of parameters required by operators.It is this factor that makes the higher abstraction planner a lot faster than the lower one, as there areless possible positions a robot could be in, so there are substantially less valid moves from any givenworld state.

4.4.3 Implementation of Plan to Behaviour Mapping

The implementation challenges are mostly the same as with the FSMs, with tuning the parametersfor the post-conditon tests. This meant that by the time I came to implement the action mappingsmost of the problems had already been overcome.

As is the case with the behaviours, and the planners, the actions that represent the mappings fromthe plan operators are all defined as subclasses of a base class. This allows for a generic routineto construct a FSM representing the plan in an array, which can be stepped through as the stateadvances.

In this way each state is contained within a separate object (based on the ActionBase class),combined with its post-condition test. As well as this, each action also stores which robots it appliesto, so that when the FSM is constructed actions which may occur concurrently are easily identified.This greatly simplifies the task of writing the top-level FSM to control plan execution, and has theadded benefit that each state may itself contain other states, effectively creating a hierarchy of FSMs.

49

5 Experimental Evaluation

As important as the design for a new architecture is the evaluation. This chapter covers the design,results and evaluation of several experiments performed. The overall goal was to investigate if thenew architecture was able to perform well, and in particular if the approach could outperform purelyreactive agents.

5.1 Experiment Design

The basic idea of all the experiments is to somehow measure the performance of teams of one ormore agents, against varying levels of opposition. Robotic football presents several challenges inthis, namely that there is not one easy measure of performance. For example, one might choose toevaluate the teams in terms of the number of goals scored in a set timeframe - much like humanfootball teams play matches. However, what is the result in the case of a draw? Without watchingthe match in the simulator it is hard to say which team played better, and even then it would be ahighly subjective measure - and we wish to strive for quantitative and repeatable experiments.

It is desirable to ensure that the conditions for each test are the same. To ensure this, the startlocations for all agents are fixed (using a configuration file) and after each goal is scored bothteams are reset to their initial positions. This allows each goal to be treated as a separate match orexperiment run.

As it is necessary only to test one team at a time, to ensure an equal and adequate sample sizeeach test team was played against its opponent for a match set to terminate when the team undertest had scored 100 goals. The principle data studied for team comparison is the time to score, ormore specifically the mean over 100 goals. To prevent bias toward computationally efficient solutionstime was measured in simulator time-steps, rather than real time. Of course it is of interest if onemethod is very time consuming, but in this prototype system only broad observations can be made,as the quality of code and optimisation present will vary.

Mean time to score is not the only interesting statistic - a team is no good if it scores very quicklyon average, only to loose the match 500-100 because the other team is scoring even quicker! Toobserve differences such as these, several other data sets were also stored from each experiment:

Time for opposition to score As in the above example, how quickly the opposition scores may beinteresting in certain cases

Final score How many goals the opposition scored in the time it took to score 100.

Total Time This is a less useful statistic, but may be worth consideration along with the number oftimes the ball goes out of bounds.

50

5.2 Evaluating the Reactive Player

Ball out-of-Bounds In some cases the combination of behaviours is such that the teams struggle tokeep the ball on the pitch - this is obviously inefficient, and can be used to explain some results.

One problem that emerged during testing of this experimental set-up was that of incompatibleteams. In some cases the behaviour was such that the teams would get stuck in a "loop". For example,the ball may go out of play, and the behaviour of the robots is such that every time it is replaced itis knocked out again, forever. To allow meaningful experiments to be carried out, if three milliontime-steps were reached in one play the teams were reset, and they were assumed to have got stuck.The number of resets was also counted. These resets also cause large values for time-to-score values,leading to problems analysing the data. So to prevent this, the timers are reset in these cases.

5.2 Evaluating the Reactive Player

The first step in evaluating the new architecture was to establish a baseline against which to test it,and consider the strengths and weaknesses of the approach. As the basic goal was to compare thehybrid architecture against a purely reactive approach, the first step was to test the purely reactiveplayer.

5.2.1 Varying Numbers of Reactive Players

The hypothesis for this experiment is that as the number of reactive players increases, the time toscore a goal will increase as they have no co-ordination mechanism, so will get in each other’s way.To test this, the above experiment was run with 1-5 reactive players against no opponents - an emptypitch.

The graph in figure 5.1 shows the mean values for each number of robots on the team, along withthe error bars. Also shown are the percentage confidence values that adjacent means are the same, ascalculated with Student’s T-test. Even although there is some doubt in the difference between 3 and4 robots, there is a clear trend that more reactive robots take longer to score, supporting the initialhypothesis.

To confirm this result, I re-ran the experiment with stationary opponents. The results also supportthe hypothesis, with a mean of 148 for one robot, and a mean of 1836 for five - both against 5stationary opponents.

Less clear is the result against randomly wandering opponents - shown in figure 5.2. As the numberof opponents increases, so does the time to score - which is to be expected. However, comparingteams of one and five robots against five wanderers gives means of 1136 and 1382 respectively.Applying a t-test to these results shows that there is a 43% chance they are similar. This suggests thatthe main factor here is the wandering opponents causing the time to score to increase, and not thenumber of robots on the team. This hypothesis is not borne out when the mean times are comparedwith the results above against no opponents. The average time to score for a team of five robotsagainst five wanderers is not significantly greater than against none.

As a final test, teams of reactive agents were pitted against each other, in a 1 v 5 and 5 v 5 scenario.From the previous results the single agent would be expected to score quicker against the team of

51


Figure 5.1: Reactive players against an empty pitch

Figure 5.2: Reactive players against wandering opponents

52

5.3 Evaluating the FSMs

5, and one would expect the 5 v 5 match to be a draw. Although the results (in figure 5.3.) show alower mean time to score for one agent than a team of five, the deviation in results is too great to beconclusive. Also interestingly the 5 v 5 match score was 100-103 as would be expected, but the lonerobot lost 134-100 against the team of 5. This suggests that despite the lack of co-ordination, a teamof reactive robots is still able to overwhelm a single opponent.

Figure 5.3: Reactive players against reactive opponents


Evaluation of the performance of the FSMs is of particular importance, as they effectively representhand-coded plans for particular plays. This means that if good performance can be achieved with theFSM based approach, then it is in theory possible with an automated planner in high-level control, itis just a matter of implementing the right planner. Testing of these also tests the hypothesis that thelow-level behaviours will still allow for completion of the plan in the face of a hostile adversary.

The experiments used for testing the FSMs were as described above, with the restriction that theFSMs are designed for specific numbers of robots, so only the number and type of the oppositioncan be varied.

5.3.1 Testing the Pass-and-Shoot FSM

The first test of the FSM is that it scores goals on an empty pitch, and to establish at what rate itcan score in this situation. Once this is done, the number and type of opponents can be varied. The

53


hypothesis would be that as the number and complexity of the opponents is increased, so does themean time to score. Additionally it would be expected that the FSM team would win by a lessermargin, or loose in the face of a hostile team. As the pass and shoot move is one widely used in theplanner based implementation, it would also be hoped that this is a successful behaviour against areactive opponent.

The first set of results in figure 5.4 below show the mean time to score against increasing numbersof static opponents. This shows the time to be constant, and so suggests that the default startpositions of the opposition do not hinder the execution of the plan.

Figure 5.4: Pass-and-Shoot FSM against static opponents

The results (below in figure 5.5) are less clear for the tests against wandering and reactive opponents.The expected increase in time over the static opponents is present, but the result that the time toscore is quicker against reactive agents than random wanderers is not expected.

This reason for this result is clearer when the hypothesis that "the more complex the opponentthe weaker the FSM will perform" is tested. Figure 5.6 shows the number of goals conceded by theFSM against increasing numbers of reactive players. Although the FSM defeats static or randomlywandering opponents, this graph shows that against a reactive team it will loose. As suggested bythe evaluation of the reactive player, the single opponent actually performs better than a team.

This may explain the unexpected results above, as it is the case that against the more complexopponent the FSM either scores easily, or not at all. This gives a relatively low mean time to scorecompared to the results against the wanderers, as in those matches sometimes it may take longer toscore, but the FSM will score most of the time.

54


Figure 5.5: Pass-and-Shoot FSM against wandering and reactive opponents

Figure 5.6: Goals conceded by Pass-and-Shoot FSM against reactive opponents

55


5.3.2 Testing the Corner-Kick FSM

The expectations for the corner-kick FSM are much the same as for the pass-and-shoot scenario, andit aims to test the same hypothesis. The difference here is that there are more robots on the team (afull five) and so it would be expected to perform better. Also, the start positions of the robots arechanged to a corner-kick scenario, so again more success would be expected, as the start position isin the opponents half. This should also lead to goals being scored quicker. Figure 5.7 below presentsthe results.

Figure 5.7: Mean time-to-score for Corner-Kick FSM

These results show the expected increase in time to score against more complex opposition. Alsopresent is the expected decrease in time to score over the pass-and-shoot FSM - the maximum hereis under 4,000 compared to just under 100,000. Also as expected is the score line. Against static orwandering opponents few goals are conceded, whereas against reactive opponents more goals areconceded - up to a maximum of 103 against 4 opponents.

5.4 Evaluating the Low Abstraction Planner

The low abstraction planner proved troublesome to evaluate for several reasons. The first was thetime for execution - due to the many possible values for parameters, and the increased number ofparameters needed to represent an X and Y location rather than just a quarter-of-pitch designation.This meant I would often run the experiment for 15 hours or more, and yet only 20 or 30 goals wouldbe scored. This was compounded by the second main problem, which was the planner favouredinstantiating parameters with arguments such that the destination for moves or passes was at the

56

5.4 Evaluating the Low Abstraction Planner

edge of the pitch (and I couldn’t solve this bug in the time available). Playing the ball close tothe edge of the pitch means that it goes out of play a lot, and compounds the problem where thesimulator gets stuck in a "loop" with the ball repeatedly going out of play.

For these reasons, the low-level of abstraction plan operators do not seem to be a good choice -they run slow and perform badly. Indeed it was initial observations of these results that promptedthe experiments with a higher level of abstraction.

Figure 5.8: Mean time-to-score for Low Abstraction Planner

Figure 5.8 shows the results from the test runs of the planner. Due to the amount of time the teststook to execute, runs were only made against 1 or 5 opponents. As can be seen from the error bars,the results are not very conclusive. The reason for the wide spread of results, and great error, is asnoted above - the ball frequently going out of play. The time to score against reactive opponents isgreater than against wanderers as expected, but the results against static opponents are less clear.

The reason for the particularly poor performance against five static opponents is clear whenthe experimental run is observed. The plan produced in the start state is flawed, and to avoid anopponent requires the ball to be passed at such an angle as it is likely to go off the pitch - causingmany repeated out-of-play manoeuvres, and thus the bad performance in terms of time to score.

57


5.5 Evaluating the High Abstraction Planner

It is reasonable to expect the best performance from the high abstraction plan operators, as thisrepresents the peak of the proposed architecture’s development. The process for evaluation is thesame as with the other control schemes - testing against stationary opponents, wandering opponentsand reactive opponents. The hypotheses are the same as for the other experiments - goals should bescored quicker against simpler opponents, and less should be conceded.

Contrary to the previous experiments, the results presented against the static opponents are veryclear cut. Figure 5.9 shows the results. The graph shows clear straight lines, with little error exceptin the 5-robot case. This is easily explained, as with each robot added it adds time for the team tonegotiate it, unless it is out of the way. So it is clear that the 1st, 3rd and 4th robot’s positions are noobstacle, but the the 2nd and especially 5th’s cause problems. The reason for the much higher timeagainst the 5th, and the greater error, is that this is the goalkeeper, so shots have to be very accurateto fit past it into the goal.

Figure 5.9: Mean time-to-score for High Abstraction Planner, Static Opponents

The picture against randomly wandering opponents (figure 5.10) is similar to against staticopponents. Particularly the planner deals with low numbers of opponents well, and the rate ofscoring is only slightly slower than against static opponents. The one anomaly of the results againstfive opponents is caused by a few very high times from the 100 measured, skewing the result andthe error. One can presume that these represent the cases where perhaps the 5th robot (which startsin the goalkeeper position) does not stray far from the goal mouth, causing difficulties.

The graph of reactive results (figure 5.11) shows a trend for increasing time to score, as expected.

58

5.5 Evaluating the High Abstraction Planner

Figure 5.10: Mean time-to-score for High Abstraction Planner, Wandering Opponents

The number of goals conceded is also included for discussion. As expected the mean times increaseover simpler opponents, rising to a maximum of 84, 000 - less than the maximum of the low-abstrationapproach.

The case for a single reactive opponent is as expected from evaluating the reactive player - themost goals are conceded as the reactive player performs at its best without hindrance from teammates. This also has the lowest mean time-to-score for the planner for the same reason as mentionedpreviously - the plan either succeeds and scores quickly, or not at all. The time to score then increasesagainst two or three opponents, as the planner starts winning, but the goals take longer to score.Then with four and five opponents the reactive player’s start to become more equal in numbers,and despite their hindrance of each other, they also obstruct the plan-based team, so more goals areconceded - up to a maximum of 108 against 5 opponents.

59


Figure 5.11: Mean time-to-score and Goals Conceded for High Abstraction Planner, Reactive Oppo-nents

5.6 Summary and Comparisons

Overall the results are broadly as expected. The new architecture presented here has the capabilityto perform better than a purely reactive approach - in particular the high abstraction planner ispredictable, efficient and able to out-perform all the other teams implemented except for a soloreactive agent.

That a solo reactive player should out-perform a planner-based team is not that unexpected, whenall the factors are considered. Although theoretically the reactive agent should suffer from beingunable to exhibit as complex behaviour as the hybrid teams, this is not a real limitation in thisimplementation, as not only is the reactive player simple, but the hybrid teams are too - with onlya basic set of plan operators allowing simple actions such as passes. This means that for theseexperiments, the only advantage that the hybrid player could be expected to show would be thatof team co-ordination. In the one robot case, this is not a hindrance for the reactive robot, so itperforms maximally - as shown in the evaluation of the reactive case. Against the planner this is asuccessful technique, as it favours an over-complex (given the situation) passing technique - wastingtime manoeuvring.

However, where it can be exploited, the co-ordination can bring many benefits. Both the corner-kickFSM and the planners are capable of defeating a team of reactive agents, and the pass-and-shootFSM performs better against a larger team of opponents, as it exploits the co-ordination advantage.

60

5.6 Summary and Comparisons

The choice of plan operators is clearly of key importance to this approach. Figure 5.12 shows thedifference in the average number of goals conceded by the different planner-based approaches. Theonly difference between the teams is the level at which the pitch is described, and yet there is a largedifference in performance by every measure. The lower abstraction operators concede more goals,score more slowly and take massively longer to execute.

Figure 5.12: Comparing Goals Conceded by Different Planners

61

6 Project Evaluation

The performance aspects of this project are evaluated in the previous experimental evaluation chapter,this chapter aims to cover the other aspects. This chapter is organised into several sections, coveringthe software development, experimental design, time management and the novel aspects of theproject. The merits of the design itself are not considered here, as it is evaluated in depth by theexperiments of the previous chapter.

As a whole I was very happy with the way the project went. Everything went to plan, I achievedas much as I could hope to and I was able to allocate plenty time for everything, including gettingthe report written.

6.1 Time Management and Planning

It is appropriate to cover time management and my plan for carrying out the work first, as the timeallocated to the project represents the biggest constraint on its scope.

A big advantage of the way I structured the implementation of the test architecture for the projectwas that it allowed for incremental development. I was able to work from the bottom-up, with eachlayer building on the functionality already implemented, and even the base layer of reactive controlproviding a complete player. This meant that as long as I worked at a decent rate I was sure to havesomething of interest to test - the FSMs on their own provided interesting results, as they prove thatthe general technique is feasible. Despite this, I did not have a detailed plan for carrying out theproject, tasks were not allocated to specific weeks, as it was not possible to discern in detail all thetasks required at the start.

It would have perhaps been interesting if I had more time to spend on testing different sets of planoperators, as even small changes turned out to have great effect. Also the amount of execution timerequired by some of the planners, e.g. the high abstraction operators, was unexpected so a lot of timewas spent on this. Starting on the planner implementation before the FSMs so I could have startedexperiments and testing earlier would have maybe have allowed more experiments, but I chose to besure that the FSMs worked as a concept first instead. This meant I could be sure the planner wouldalso work - as it amounts to automatically generating FSMs.

6.2 Software Development

As a whole the software development side of the project went very well - I achieved everythingI set out to do, in the time available. Designing and implementing the simulator and structurefor testing the architecture was a large part of the effort in this project. Getting the framework in

62

6.3 Experimental Evaluation

place for implementing different behaviours and control techniques was not a lot of work, however,compared to implementing and testing these, which was very time consuming. The main issue wasthe difficulty in rigourous tests for the control algorithms - it was a case of watching the robots in thesimulator and seeing if they looked "right" under different conditions.

This is primarily the reason no formal development methodology was used - it wasn’t practicalto develop the test cases before implementation - only consider rough requirements. This does notmean the entire system was hacked together - it was not. The general process was to work out therequirements on paper, sketch the class structure and then implement it. For testing the controlalgorithms either flowcharts or story-boards were drawn to decide the behaviour, then implemented(as with the diagrams and story boards in chapter 3). This worked well - I had no cause for any majorredesign or re-implementation work during the project, and ensuring that I knew what I expected ofa behaviour before implementing it made assessment of their performance straightforward.

As is mentioned elsewhere, by knowing right from the start the broad components that I wouldrequire, I was able to schedule implementation tasks in an order that would allow for incrementaldesign and testing, with each building on the last. This fitted well with my development style ofdesign, implement and test.

C++ proved to be a very good choice of language for implementation. As the simulator waswritten in it, it was a natural choice, as my code was tightly integrated, but it would also have beenmy first choice due to my experience with it. A bonus was that a C++ library for my chosen reactivecontrol technique, OpenSteer [2], was available. As well as this, the object-orientated nature of C++allowed for code reuse, and easy combination of behaviours to form others. Another high levellanguage such as Java may have been appropriate - I have some experience with it, and its manybuilt in libraries can speed development, but the above reasons made C++ a compelling choice.

6.3 Experimental Evaluation

Overall the experimental evaluation was complete, and there was time to ensure that a wide enougharray of tests could be run to show that the new proposed architecture presents many benefits, andis as a whole a success. However, this does not mean there is not room for improvement.

Given more time for the project, there is potential to carry out more thorough scientific analysisof the systems implemented. Many of the results have large errors in them, and these could beimproved by more simulation runs. However, this was not possible as typically the experiments withthe greatest error were those that took the longest to run, and so time restrictions prevented this.

The difficulty of establishing a quantitative measurement of a team’s performance is apparent inthe analysis of the results - even although teams may appear to perform badly by one measurement,observation often reveals the team works well and vice-versa. More time to experiment could haveallowed investigation of more aspects that could serve as a measure of performance. For example,there is no analysis of the time taken for the opposition to score, analysing a team’s defensiveperformance. Other measures not considered include robot-ball and robot-robot collisions andout-of-play events.

Some of the evaluation could have been supported by more experimentation. For example, thephenomena of a low mean time-to-score was observed for cases where many goals are conceded, and

63

6 Project Evaluation

it was hypothesised that this is due to goals being scored quickly, or not at all. Some of the results,and some observation, backs this theory up, but more extensive testing may be instructive. Thisresult could be backed up by striving to design more experiments where many goals were concededin an attempt to observe the phenomena, perhaps by handicapping a team to loose.

A final issue with the experimental evaluation was in the non-determinism of the simulator. Itshould not be necessary to play more than one goal between teams in theory, as they all behave thesame way each time, and the start positions are the same. However, it emerged during testing thatthe physics simulator was not entirely deterministic, and so slight changes would creep in, resultingin passes occasionally missing, thus causing different control laws to be in effect and so on. Therewas nothing that could be done in the time to prevent this, as it would have involved a major re-write,and only came to prominence in the later stages of the project. By running each test to 100 goals andworking with the averages and considering the standard error the analysis accounts for the variancein runs.

6.4 Novel Aspects of the Project

The main novel aspect of this project is the design, implementation and testing of the hybridarchitecture itself, by combining a planner with mappings to sets of behaviours. As far as I am awarethis is an approach that has not been tested before, and my results show that it holds some promise.Bowling et al. [11] describe an approach where team plays are executed much like the plans arehere, but they are not computed, but rather hand coded. Pires et al. [29] propose as part of theirarchitecture an "ALBU":

"The Advanced Logic Based Unit is the next step of our work. The ALBU is basedon situational calculus, and on a planning system. The goal of this component is todetermine plans (sequences of behaviours) that allow the team to achieve something (likescoring on the opposite goal)." [29]

This sounds like a proposal very similar to the system implemented by my project, but I wasunable to find further details or work published on this.

The creation of the FSMs, and particularly the technique by which the synthesised plans aretranslated into an FSM for execution is unique to this project - although this is an artefact of thetechnique itself being new - it is a necessary part of the proposed architecture.

64

7 Project Conclusions and Further Work

This chapter covers what this project has achieved, and discusses possible extensions arising fromthe work and experiments carried out.

7.1 Conclusions

The results from this project show that combining classical planning and reactive control can yielda viable control architecture for a team of reactive agents. Having said this, the choice of planoperators is crucial, and the results would suggest that keeping the planner high level and based onco-ordination techniques gives the biggest advantages with least execution time penalty.

The evaluation of a purely reactive team, and the result that the greater the number of playersthe worse the performance is, shows that some form of co-ordination is required. The results fromthe FSM based systems show that even simple hard-coded co-ordination routines can provide animprovement over this baseline, and that FSMs in general are a valid method for executing teamplans.

Although this implementation and evaluation is purely within the RoboCup domain, the architec-ture developed here has potential for other applications where co-ordination of teams of agents isrequired. An area where the method would show particular promise would be in strategy games,where computing resource is limited, and it may be necessary to control many units at once. Thismethod would be of great advantage in this case, as the units are only ever using a few simple rulesat a time - so it scales well computationally to large number of units. In addition, so long as the planoperators are high-level enough and carefully chosen, then good plans can be generated efficiently togive an AI player the potential for good performance.

7.2 Further Work

The scope of the project was to investigate the hybrid architecture and assess its potential. With thiscarried out, and the promising results, there are new possibilities for investigation.

7.2.1 Continued Investigation of the Existing Solution

There are several potential avenues of continued investigation into the implementation I created, thattime constraints did not permit me to carry out. During testing it was common for the simulationto get stuck in a "loop" - with the ball going out of play, and being knocked back out as soon as itwas replaced. Investigating how to detect and correct for these situations would greatly improve the

65

7 Project Conclusions and Further Work

performance of some of the planner implementations - or at least reduce the variance in the results.This is just a small part of a greater area of study, which would be plan execution monitoring andrecovery or repair. Ideally part of the plan execution should involve monitoring plan progress, and ifthings go wrong, or not to plan then it is necessary to either re-plan for the new state, or repair theplan in some way. This poses many questions about how to detect failure, as things are expected notto go quite to plan - but the behaviours will compensate. Perhaps certain plan actions could havetime-limits associated with them, but determining these times would require an empirical process,with much experimentation.

The two sets of plan operators presented here represent only the simplest possible set that couldprovide interesting behaviour to study. There is great scope for investigating more complex sets ofoperators, and perhaps even different types of planners such as metric planners to minimise agentmovement. More complex plays could be introduced, by creating operators that cause robots to makea feint, or engage in more intricate passes or plays.

Further testing could be carried out on the current solution, for example against other existingRoboCup teams. This is a non-trivial task however, as it would involve either re-implementinganother set of behaviour rules in the simulator, or re-implemeting this scheme to work with theRoboCup simulation league. Both of these approaches present several difficulties. The main problemsimplementing an existing team in this simulation involve getting hold of the control rules used, andmore importantly tuning them for the environment. Implementing this scheme in the simulatorleague would be more practical, but would involve nearly a complete re-write of the system, andre-tuning of behaviour parameters to the new simulator - which is a far more complex affair than thesimple one used for proof-of-concept here.

Also not investigated here is the potential for a generated plan to out-perform a hand coded one.This would require testing automated plans against best-effort hand-coded plans using the sameoperators. The planner used in this project considers a shorter plan to be better - but this may notalways be the case, and by hand-coding exemplar plans this may be investigated.

7.2.2 Extensions to the Architecture

There are also possible extensions to the overall solution that would provide for interesting investiga-tion and perhaps increased performance. One potential improvement could be to apply machinelearning techniques to parts of the architecture.

For example, some plans may be more successful than others against certain opponents or incertain scenarios, so it may be possible to learn these and perhaps generate several plans for agiven state and use the learned classifier to select the best plan. Similarly, learning and oppositionmodelling techniques could be applied to adjust the plan operators used or mappings to behavioursdependant on observations of the opposition.

One of the most time consuming phases of the project was tuning various parameters, for both thebehaviours and post-conditions of plan actions. The problem is that they interact in complex waysand so require time-consuming trial and error design. It would be feasible to set up the system aspart of a genetic algorithm so that good parameters could be evolved, or to take this a step furtherand evolve the behaviours themselves.

The architecture as it currently stands assumes a single global planner with complete knowledge

66

7.2 Further Work

of all the agents. If it was to be applied to real robots in different scenarios then it would requireadaptation to share information before the plan was generated, and then further communicationof the plan. This would involve a significant rethink of how the planner was implemented, but thesame basic principle of mapping plan actions to behaviour sets should still be possible.

67

Bibliography

[1] URL http://www.fu-fighters.de/.

[2] Opensteer: Steering behaviours for autonomous characters. URL http://opensteer.sourceforge.net/.

[3] PDDL 2.1. URL http://planning.cis.strath.ac.uk/competition/pddl.html.

[4] R. Alami, R. Chatila, S. Fleury, M. Ghallab, and F. Ingrand. An architecture for autonomy.International Journal of Robotics Research, 17(4):315–337, April 1998.

[5] Heni Ben Amor, Jan Murray, and Oliver Obst. Fast, neat and under control: Inverse steeringbehaviors for physical autonomous agents. Technical Report 12-2003, Institut fur Informatik,Universitat Koblenz-Landau, 2003.

[6] Heni Ben Amor, Jan Murray, Oliver Obst, and Christoph Ringelstein. Robolog koblenz 2003 -team description. In LNAI: RoboCup 2003: Robot Soccer World Cup VII, volume 3020, 2004.

[7] Sven Behnke and Raúl Rojas. A hierarchy of reactive behaviours handles complexity. LNCSBalancing Reactivity and Social Deliberation in Multi-Agent Systems, pages 125–136, 2001.

[8] Sven Behnke, Bernhard Frötschl, Raúl Rojas, Peter Ackers, Wolf Lindstrot, Manuel de Melo,Mark Preier, Andreas Schebesch, Mark Simon, Martin Sprengel, and Oliver Tenchio. Usinghierarchical dynamical systems to control reactive behavior. Lecture Notes In Computer Science,1856:186–195, 2000.

[9] Avrim Blum, Merrick Furst, and John Langford. Graphplan home page. URL http://www.cs.cmu.edu/~avrim/graphplan.html.

[10] R. Peter Bonasso, James Firby, Erann Gat, David Kortenkamp, David P. Miller, and Marc G.Slack. Experiences with an architecture for intelligent, reactive agents. Journal of Experimental &Theoretical Artificial Intelligence, 9(2/3):237–256, 1997.

[11] Michael Bowling, Brett Browning, Allen Chang, and Manuela Veloso. Plays as team plans forcoordination and adaptation. Robocup 2003, LNAI 3020:686–693, 2004.

[12] Rodney A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Roboticsand Automation, 2(1):14–23, March 1986. URL http://people.csail.mit.edu/brooks/papers/AIM-864.pdf.

[13] Z. Crisman, E. Curre, C.T. Kwok, L. Meyers, N. Ratliff, L. Tsybert, and D. Fox. Team description:Uw huskies-02. RoboCup-2002: Robot Soccer World Cup VI, 2003. URL http://www.cs.washington.edu/ai/Mobile_Robotics/Aibo-2002/postscripts/UWHuskies02.pdf.

[14] Mathijs de Weerdt, Adriaan ter Mors, and cees Witteveen. Multi-agent planning: An introductionto planning and coordination. In Handouts of the European Agent Summer School, pages 1–32, 2005.URL http://www.st.ewi.tudelft.nl/~mathijs/publications/easss05.pdf.

68

http://www.fu-fighters.de/

http://opensteer.sourceforge.net/

http://planning.cis.strath.ac.uk/competition/pddl.html

http://www.cs.cmu.edu/~avrim/graphplan.html

http://www.cs.cmu.edu/~avrim/graphplan.html

http://people.csail.mit.edu/brooks/papers/AIM-864.pdf

http://www.cs.washington.edu/ai/Mobile_Robotics/Aibo-2002/postscripts/UWHuskies02.pdf

http://www.cs.washington.edu/ai/Mobile_Robotics/Aibo-2002/postscripts/UWHuskies02.pdf

http://www.st.ewi.tudelft.nl/~mathijs/publications/easss05.pdf

Bibliography

[15] Keith S. Decker and Victor R. Lesser. Designing a family of coordination algorithms. TechnicalReport 94-14, UMass Computer Science Department, 1995.

[16] Anna Egorova. MAAT - Multi Agent Authoring Tool for Programming Autonomous Mobile Robots.PhD thesis, Fachbereich Mathematik und Informatik, Freie Universitt Berlin, 2004.

[17] Innes A. Ferguson. Touringmachines: Autonomous agents with attitudes. Technical report,Computer Laboratory, University of Cambridge, 1992.

[18] Gordon Fraser, Gerald Steinbauer, and Franz Wotawa. Application of qualitative reasoning torobotic soccer. In Proceedings of the 18th International Workshop on Qualitative Reasoning, 2004.

[19] E. Gat. On three-layer architectures. Artificial Intelligence and Mobile Robots. MIT/AAAI Press,1997.

[20] Malik Ghalib, Danu Nau, and Paolo Traverso. Automated Planning: Theory and Practice. MorganKaufmann, 2004.

[21] Jörg Hoffman. Fast-forward. URL http://www.mpi-sb.mpg.de/~hoffmann/ff.html.

[22] Vincent Hugel, Guillame Amourous, Thomas Costis, Patrick Bonnin, and Pierre Blazevic.Specifications and design of graphical interface for hierarchical finites state machines. Technicalreport, Laboratoire de Mécatronique et de Robotique de Versailles (LMRV), 2005. URL http://www.lrv.uvsq.fr/research/legged/papers/tech_reports/2005/hugel.pdf.

[23] Herbert Jaeger and Thomas Christaller. Dual dynamics: Designing behaviour systems forautonomous robots. Artifical Life and Robotics, 2:108–112, 1998.

[24] Rune M. Jensen and Manuela M. Veloso. Interleaving deliberative and reactive planning indynamic multi-agent domains. In Proceedings of the AAAI Fall Symposium on Integrated Planningfor Autonomous Agent Architectures, 1998.

[25] Hans Lausen, Jakob Nielsen, Michael Nielson, and Pedro Lima. Model and behavior-basedrobotic goalkeeper. RoboCup 2003, LNAI 3020, 2004.

[26] Scott Lenser, James Bruce, and Manuela Veloso. A modular hierachical behavior-based architec-ture. LNCS RoboCup 2001: Robot Soccer World Cup V, 2377:423, 2002.

[27] Peter Linz. An Introduction to Formal Languages and Automata. Jones and Bartlett, 3rd edition,2001.

[28] John Martin. Introduction to Languages and the Theory of Computation. McGraw-Hill, 3rd edition,2003.

[29] Vasco Pires, Miguel Arroz, and Luis Custódio. Logic based hybrid decision system for amulti-robot team. In Proceedings of the 8th Conference on Intelligent Autonomous Systems, 2004.

[30] Craig W. Reynolds. Steering behaviors for autonomous characters. In Proceedings of GameDevelopers Conference, pages 763–782, 1999.

[31] Stuart Russel and Peter Norvig. Artificial Intelligence, A Modern Approach. Prentice Hall, 2ndedition, 2003.

[32] Russell Smith. The open dynamics engine. URL http://www.ode.org/.

[33] The RoboCup Federation. URL http://www.robocup.org.

69

http://www.mpi-sb.mpg.de/~hoffmann/ff.html

http://www.lrv.uvsq.fr/research/legged/papers/tech_reports/2005/hugel.pdf

http://www.lrv.uvsq.fr/research/legged/papers/tech_reports/2005/hugel.pdf

http://www.ode.org/

http://www.robocup.org

Bibliography

[34] William Uther, Scott Lenser, James Bruce, Martin Hock, and Manuela Veloso. Cm-pack’01:Fast legged robot walking, robust localization, and team behaviors. In RoboCup 2001: The FifthRoboCup Competitions and Conferences, 2002.

[35] M. Veloso and P. Stone. Individual and collaborative behaviours in a team of homogenousrobotic soccer agents. In Proceedings of the Third International Conference on Multi-Agent Systems,pages 309–316, 1998.

[36] Manuela Veloso, Paul E. Rybski, Sonia Chernova, Colin McMillen, Juan Fasola, Felix vonHundelshausen, Douglas Vail, Alex Trevor, Sabine Hauert, and Raquel Ros Espinoza. Cmdash’05:Team report. Technical report, School of Computer Science, Carnegie Mellon University, 2005.URL http://www.cs.cmu.edu/~robosoccer/legged/reports/CMDash05-report.pdf.

[37] R. Volpe, I.A.D. Nesnas, T. Estlin, D. Mutz, R. Petras, and H. Das. CLARAty: Coupledlayer architecture for robotic autonomy. Technical Report D-19975, NASA JPL, 2000. URLhttp://www-robotics.jpl.nasa.gov/publications/Richard_Volpe/CLARAty.pdf.

[38] R. Volpe, I. Nesnas, T. Estlin, D. Mutz, R. Petras, and H. Das. The CLARAty architecture forrobotic autonomy. In Proceedings of the 2001 IEEE Aerospace Conference, 2001.

[39] Gerhard Weiss. Multi-Agent Systems, A Modern Approach to Distributed Artificial Intelligence. MITPress, 1999.

70

http://www.cs.cmu.edu/~robosoccer/legged/reports/CMDash05-report.pdf

http://www-robotics.jpl.nasa.gov/publications/Richard_Volpe/CLARAty.pdf

A Plan Operators

This appendix contains the actual plan operators used, specified in PDDL[3].

A.1 Low Abstraction Plan Operators

(define (domain robocup13x9)(:requirements :adl)(:types ROBOT COLUMN ROW QUARTER)(:predicates

(robot−at ?r − ROBOT ?ca − COLUMN ?ra − ROW)(ball−at ?ca − COLUMN ?ra − ROW)(yellow ?r −ROBOT)(blue ?r − ROBOT)

(possesion ?r − ROBOT)(yellow−half ?row − ROW)(blue−half ?row − ROW)(left ?col − COLUMN)(right ?col − COLUMN)(waiting ?robot − ROBOT)(score)(defending ?robot − ROBOT)(least−blue ?quarter − QUARTER)

(in−quarter ?quarter − QUARTER ?col − COLUMN ?row − ROW)(out−of−play)

)

; ;A pass, implement as kicktoX(:action PassToX

:parameters (?r − ROBOT ?r2 − ROBOT ?ca − COLUMN ?ra − ROW ?cb − COLUMN ?rb − ROW):precondition (and (yellow ?r) (ball−at ?cb ?rb) (possesion ?r)

(yellow ?r2) (robot−at ?r2 ?ca ?ra) (waiting ?r2))

:effect (and (not (possesion ?r)) (possesion ?r2) (not (ball−at ?cb ?rb)) (ball−at ?ca ?ra) (when (waiting ?r) (not (waiting ?r))) (when (defending ?r) (not (defending ?r))) )

)

71

A Plan Operators

;;move and wait for a pass;;combined as one action to decrease plan depth.;;only wait for a pass in a quiet quarter

(:action GoAndWait:parameters (?r −ROBOT ?ca − COLUMN ?ra − ROW ?cb − COLUMN ?rb − ROW ?q − QUARTER)

:precondition (and (yellow ?r) (robot−at ?r ?cb ?rb) (in−quarter ?q ?ca ?ra) (least−blue ?q) ):effect (and (robot−at ?r ?ca ?ra) (not (robot−at ?r ?cb ?rb)) (when (possesion ?r) (not (possesion ?r)))

(waiting ?r) (when (defending ?r) (not (defending ?r))) ))

;;shoot for the goal;;only shoot if in their half

(:action Shoot:parameters (?r − ROBOT ?ca − COLUMN ?ra − ROW):precondition (and (yellow ?r) (possesion ?r) (robot−at ?r ?ca ?ra) (blue−half ?ra) ):effect (and (score) (not (possesion ?r)))

)

;;A shoot for a lone robot, but this should only be used when passing I (:action SoloShoot:parameters (?r − ROBOT ?col − COLUMN ?row − ROW ?q − QUARTER):precondition (and (yellow ?r) (possesion ?r) (ball−at ?col ?row) (in−quarter ?q ?col ?row) (least−blue

?q) )

:effect (and (not (possesion ?r)) (score) ))

;;removes robot’s location, means the only thing that can be done is pass;;also invalidates ball position, but then so do so many other actions...

(:action GetBall:parameters (?r − ROBOT ?ca − COLUMN ?ra − ROW):precondition (and (yellow ?r) (robot−at ?r ?ca ?ra) (not (waiting ?r)) ):effect (and (possesion ?r) (when (waiting ?r) (not (waiting ?r))) (not (robot−at ?r ?ca ?ra)) (when (

defending ?r) (not (defending ?r))) ))

;;sets up a robot as a defender;;requires it’s in the home half and sets state to defending in a position;;general purpose defender requires there is already a goalie − a goalie sits in square c5 r02 (Infront of goal);;Do i need to add the requirement that goalie is not the current robot?

(:action Defend:parameters (?r − ROBOT ?cb − COLUMN ?rb − ROW ?g − ROBOT):precondition (and (yellow ?r) (robot−at ?r ?cb ?rb) (yellow−half ?rb) (yellow ?g) (robot−at ?g c5 r02) (

defending ?g) ):effect (and (when (possesion ?r) (not (possesion ?r))) (when (waiting ?r) (not (waiting ?r))) (defending

?r)))

72

A.2 High Abstraction Plan Operators

;;mustn’t already be one(:action Goalie

:parameters (?r −ROBOT ?cb − COLUMN ?rb − ROW):precondition (and (yellow ?r) (robot−at ?r ?cb ?rb) (not (exists (?g − ROBOT) (and (yellow ?g) (robot

−at ?g c5 r02) (defending ?g) )) ) ):effect (and (robot−at ?r c5 r02) (not (robot−at ?r ?cb ?rb)) (when (possesion ?r) (not (possesion ?r)))

(when (waiting ?r) (not (waiting ?r))) (defending ?r)))

;;actions to handle (out−of−play) events

(:action OPassToFreeRobot:parameters ():precondition (out−of−play):effect (score)

)

)


(define (domain robocup2x2)(:requirements :adl)(:types ROBOT QUARTER)(:predicates

(robot−at ?r − ROBOT ?q − QUARTER)(ball−at ?quarter)(yellow ?r −ROBOT)(blue ?r − ROBOT)(possesion ?r − ROBOT)(yellow−half ?q − QUARTER)(blue−half ?q − QUARTER)(left ?q − QUARTER)(right ?q − QUARTER)(waiting ?robot − ROBOT)(score)(defending ?robot − ROBOT)(least−blue ?quarter − QUARTER)(out−of−play)(goalie ?r − ROBOT)

)

;;A pass, implement as kicktoX

73

A Plan Operators

(:action PassToX:parameters (?r − ROBOT ?r2 − ROBOT ?qrob − QUARTER ?qball − QUARTER):precondition (and (yellow ?r) (ball−at ?qball) (possesion ?r) (yellow ?r2) (robot−at ?r2 ?qrob) (waiting

?r2))

:effect (and (not (possesion ?r)) (possesion ?r2) (not (ball−at ?qball)) (ball−at ?qrob) (when (waiting ?r) (not (waiting ?r))) (when (defending ?r) (not (defending ?r))) (when (goalie ?r) (not (goalie ?r))) )

)

;;move and wait for a pass;;combined as one action to decrease plan depth.;;only wait for a pass in a quiet quarter;;don’t move if you have possesion, cos that’s silly.

(:action GoAndWait:parameters (?r −ROBOT ?qAt − QUARTER ?qTo − QUARTER):precondition (and (yellow ?r) (robot−at ?r ?qAt) (least−blue ?qTo) (not (possesion ?r) ) ):effect (and (robot−at ?r ?qTo) (not (robot−at ?r ?qAt)) (waiting ?r) (when (defending ?r) (not (

defending ?r))) (when (goalie ?r) (not (goalie ?r))) ))

;;shoot for the goal;;only shoot if in their half

(:action Shoot:parameters (?r − ROBOT ?q − QUARTER):precondition (and (yellow ?r) (possesion ?r) (robot−at ?r ?q) (blue−half ?q) ):effect (and (score) (not (possesion ?r)))

)

;;A shoot for a lone robot, but this should only be used when passing ISNOT the best option;;..whatever that means ;−);;cannot depend on robot’s position! but could depend on balls? (dubious);;requires ball is in a quiet quarter (kinda assumes the robot did a getball first

(:action SoloShoot:parameters (?r − ROBOT ?q − QUARTER):precondition (and (yellow ?r) (possesion ?r) (ball−at ?q) (least−blue ?q) )

:effect (and (not (possesion ?r)) (score) ))

;;removes robot’s location, means the only thing that can be done is pass(:action GetBall

:parameters (?r − ROBOT ?qRob − QUARTER ?qBall − QUARTER):precondition (and (yellow ?r) (robot−at ?r ?qRob) (ball−at ?qBall) (not (waiting ?r)) (not (exists (?r2 −

ROBOT) (and (possesion ?r2) (yellow ?r2)) ) ) )

74


:effect (and (possesion ?r) (when (waiting ?r) (not (waiting ?r))) (not (robot−at ?r ?qRob)) (robot−at ?r?qBall) (when (defending ?r) (not (defending ?r))) (when (goalie ?r) (not (goalie ?r))) )

)

;;sets up a robot as a defender;;requires it’s in the home half and sets state to defending in a position;;general purpose defender requires there is already a goalie

(:action Defend:parameters (?r − ROBOT ?q − QUARTER ?g − ROBOT):precondition (and (yellow ?r) (robot−at ?r ?q) (yellow−half ?q) (yellow ?g) (goalie ?g) (not (score))):effect (and (when (possesion ?r) (not (possesion ?r))) (when (waiting ?r) (not (waiting ?r))) (defending

?r) (when (goalie ?r) (not (goalie ?r))) ))

;;mustn’t already be one;;rounds goalie position to q3

(:action Goalie:parameters (?r −ROBOT ?q − QUARTER):precondition (and (yellow ?r) (robot−at ?r ?q) (not (exists (?g − ROBOT) (goalie ?g) )) (not (score)) ):effect (and (robot−at ?r q3) (not (robot−at ?r ?q)) (when (possesion ?r) (not (possesion ?r))) (when (

waiting ?r) (not (waiting ?r))) (defending ?r) (goalie ?r)))

;;actions to handle (out−of−play) events

(:action OPassToFreeRobot:parameters ():precondition (out−of−play):effect (score)

)

)

75

a hybrid reactive and plan-based agent architecture for robotic … · 2014-03-13 · the...

Documents