high performance multi-agent system based...
TRANSCRIPT
High Performance Multi-agent System based Simulations
Thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Science (by Research)
in
Computer Science and Engineering
by
Prashant Sethia
200602017
prashant [email protected]
Center for Data Engineering
International Institute of Information Technology
Hyderabad - 500 032, INDIA
June 2011
Copyright c© Prashant Sethia, 2011
All Rights Reserved
International Institute of Information Technology
Hyderabad, India
CERTIFICATE
It is certified that the work contained in this thesis, titled“High Performance Multi-agent
System based Simulations” by Prashant Sethia, has been carried out under my supervision
and is not submitted elsewhere for a degree.
Date Advisor: Dr. Kamalakar Karlapalem
To my fatherMr. Moti Lal Sethiaand my motherMrs. Saroj Sethia
Sincere gratitude to
My grandfather,Shri Santok Chand Sethia. His blessings are always with me and taking
me through in all ups-and-downs.
My fatherandmother, for their endless love, care and sacrifices done for me.
My guide,Dr. Kamalakar Karlapalem, for inspiring and guiding me throughout my work.
He will always be the first one whenever I need any advice.
And mysistersandfriends, for making my life beautiful and full of joy.
v
Abstract
Real-life city-traffic simulation presents a good example ofmulti-agent simulations in-
volving a large number of agents (each human modelled as an individual agent). Analysis of
emergent behaviors in social simulations largely depends on the number of agents involved
(more than100,000agents at least). Due to large number of agents involved, it takes sev-
eral seconds or even minutes to simulate a single second of real-life. Hence, we resort to
distributed computing to speed-up the simulations. We usedHadoop to manage and execute
multi-agent application on a small cloud of computers and used CUDA framework to develop
an agent-simulation model on GPUs.
Hadoop is known for efficiently supporting massive work-loads on a large number of sys-
tems with the help of its novel task distribution and the MapReduce model. It provides a
fault-tolerant and failure-resilient underlying framework and these properties get propagated
to the multi-agent simulation solution developed on top of it. Further, when some of the sys-
tems fail, Hadoop dynamically re-balances the work-load onremaining systems and provides
capability to add new systems on the fly while the simulation is running. In our solution,
agents are executed as MapReduce tasks; and a one time execution of all the agents in the
simulation becomes a MapReduce job. Agents need to communicate with each other for mak-
ing autonomous decisions. Agent communications may involve inter-node exchange of data
causing network I/O to increase and become the bottle-neck in multi-agent simulations. Fur-
ther, Hadoop does not process small files efficiently. We present an algorithm for grouping
agents on the cluster of machines such that frequently communicating agents are brought to-
gether in a single file (as much as possible) avoiding inter-node communication. The speed-up
vi
vii
achieved using Hadoop was limited by the overheads of a largenumber of MapReduce tasks
running to execute agents and also due to the steps taken by Hadoop to provide failure re-
silience. Moreover, it is slow to access and hard to append HDFS based files. Hence, we use
Lucene index based mechanism to store agent data and implement agent messaging. With
our implementation, we are able to achieve better performance and scalability in comparison
to simulation frameworks currently available.
We experimented with GPUs for speeding-up massive complex simulations, GPUs be-
ing managed with help of CUDA framework. CUDA provides the capability to create and
manage large number (1012) of light-weight GPU threads. It follows Single Instruction Mul-
tiple Threads architecture; the single instruction being modelled as a kernel function. Agent
strategies are hence developed as kernel functions with each agent running as a separate GPU
thread. The consistency in agent-states are maintained by using a four-state agent execution
model. The model decentralizes the problem of consistency and helps in avoiding bottle-
necks that may have occurred due to a single centralized system.
We tested performance of the two developed frameworks and compared them with the
existing solutions like NetLogo, MASON and DMASF. The experiments involved both, sim-
ulations having a lot of computation and simulations with large number of messages amongst
agents. The framework developed on Hadoop provided a linearscale-up in the number of
running agents with the increase in number of machines in thecluster. Time taken to exe-
cute a simulation for a fixed number of agents decreased inversely as number of machines
increased. Hadoop rebalanced the simulation, with very little overhead in time, when some
machines failed and new ones were added dynamically. However, the execution time was
upto 6-8 times more as compared with DMASF with same experimental settings. But with
the help of Lucene indexing along with Hadoop, the execution-time improved and for larger
number of agents our system outperformed it. The GPU solution on the other hand outper-
formed all the existing CPU solutions in terms of execution time. The speed-up achieved was
as much as 10,000 times for some simulations when compared with DMASF running on 2
CPUs for same experiments.
Contents
Chapter Page
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Contribution and Organization . . . . . . . . . . . . . . . . . . . . . .. . . 5
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 AB-Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 A Multi-agent Simulation Framework on Small Hadoop Clusters . . . . . . . . . 123.1 Hadoop Architecture and Map-Reduce Model . . . . . . . . . . . . .. . . . 13
3.1.1 Hadoop Distributed File System (HDFS) . . . . . . . . . . . . .. . 133.1.2 Map-Reduce Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.3 MapReduce Job Execution by Hadoop . . . . . . . . . . . . . . . . . 14
3.2 Multi-agent Simulation Framework on Hadoop . . . . . . . . . .. . . . . . 153.2.1 Handling Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1.1 Namenode Failure . . . . . . . . . . . . . . . . . . . . . . 193.2.1.2 Secondary Namenode Failure . . . . . . . . . . . . . . . . 193.2.1.3 Datanode Failure . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Dynamic Addition of New Nodes . . . . . . . . . . . . . . . . . . . 203.3 Implementation Issues for MAS framework on Hadoop . . . . .. . . . . . . 21
3.3.1 Small-Files-Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1.1 Solution to the Small-Files-Problem . . . . . . . . . . . .21
3.3.2 Agent Communication . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.2.1 Agent Clustering algorithm based on agent-communication 233.3.2.2 Implementing Greedy Agent-Redistribution Algorithm . . 25
3.3.2.2.1 In MAP-1 phase: . . . . . . . . . . . . . . . . . 253.3.2.2.2 In REDUCE-1 phase: . . . . . . . . . . . . . . . 263.3.2.2.3 In MAP-2 phase: . . . . . . . . . . . . . . . . . 263.3.2.2.4 In REDUCE-2 phase: . . . . . . . . . . . . . . . 26
3.3.2.3 Placing the agent-groups into Files . . . . . . . . . . . . .263.3.3 Queries in Agent-State Updates . . . . . . . . . . . . . . . . . . .. 27
viii
CONTENTS ix
3.4 Agent Execution using Lucene/Solr Indexing . . . . . . . . . .. . . . . . . 283.4.1 Introduction to Lucene and Solr . . . . . . . . . . . . . . . . . . .. 283.4.2 Implementation Strategy . . . . . . . . . . . . . . . . . . . . . . . .29
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .313.5.1 Circle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.5.2 Standing Ovation Problem (SOP) . . . . . . . . . . . . . . . . . . .363.5.3 Sand-pile Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 363.5.4 KP Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5.5 Dynamic Nodes Addition . . . . . . . . . . . . . . . . . . . . . . . 373.5.6 Scalability Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5.7 Experiments and comparison with Lucene . . . . . . . . . . . .. . . 38
4 Efficient Multi-Agent Simulation using Four State Agent Execution Model on GPUs 414.1 Outline of nVidia Compute Unified Device Architecture . . .. . . . . . . . 414.2 Agent Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
4.2.1 Evaluating ‘decision-lag’ . . . . . . . . . . . . . . . . . . . . . .. . 444.2.2 Utility of ‘perceive’ state . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 FSAM-framework Architecture . . . . . . . . . . . . . . . . . . . . . .. . . 464.3.1 Distribution of agents . . . . . . . . . . . . . . . . . . . . . . . . . .474.3.2 Event-driven approach of agent-state updates . . . . . .. . . . . . . 484.3.3 Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3.4 Warp management . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .504.4.1 Experiments on performance . . . . . . . . . . . . . . . . . . . . . .51
4.4.1.1 Computationally intense with no messaging (Circle simu-lation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1.2 Communication intensive (Hand-shake Simulation).. . . . 554.4.1.3 Messaging and Computationally balanced (Sand-pileSim-
ulation). . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.2 Evaluating FSAM-framework Architecture . . . . . . . . . .. . . . 56
4.4.2.1 Agent distribution algorithm . . . . . . . . . . . . . . . . 564.4.2.2 Warp management . . . . . . . . . . . . . . . . . . . . . . 564.4.2.3 Stress Testing . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Appendix A: Example codes for agent simulation on Hadoop simulation framework 66A.1 Circle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A.2 Standing Ovation Simulation . . . . . . . . . . . . . . . . . . . . . . .. . . 69
x CONTENTS
A.3 Sand-pile Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
Appendix B: Example codes for agent simulation on FSAM. . . . . . . . . . . . 79B.1 Circle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.2 Hand-shake Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81B.3 Sand-pile Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
List of Figures
Figure Page
2.1 AB-Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Agent distribution and communication . . . . . . . . . . . . . . .. . . . . . 243.2 Comparing iteration-time for cachingOn and Off for CircleSimulation on
Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3 Comparing number of inter-node messages for clusteringOn and Off for Stand-
ing Ovation Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4 Comparing iteration-time for clusteringOn and Off for Standing Ovation Sim-
ulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 Comparing iteration-time for clusteringOn and Off for Sandpile simulation . 333.6 Iteration times obtained for KP-Simulation . . . . . . . . . .. . . . . . . . . 343.7 Iteration times obtained when removing and adding machines dynamically . . 343.8 Scalability test with 200,000 agents . . . . . . . . . . . . . . . .. . . . . . 353.9 Scalability Test with varying number of datanodes . . . . .. . . . . . . . . . 353.10 Comparing iteration-times between Hadoop with HDFS, Hadoop with Lucene,
DMASF for Circle Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 383.11 Comparing iteration-times between Hadoop with HDFS, Hadoop with Lucene,
DMASF for Standing Ovation Simulation . . . . . . . . . . . . . . . . . .. 393.12 Comparing iteration-times between Hadoop with HDFS, Hadoop with Lucene,
DMASF for Sand-pile Simulation . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Agent state change diagram . . . . . . . . . . . . . . . . . . . . . . . . .. . 434.2 FSAM-framework Architecture . . . . . . . . . . . . . . . . . . . . . .. . . 474.3 Decision-lag obtained for different scenarios on GPU-based FSAM . . . . . . 514.4 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:
Circle-simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.5 Idle-time comparison between GPU-based FSAM and CPU-based DMASF:
Circle-simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.6 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:
Hand-shake simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
xi
xii LIST OF FIGURES
4.7 Idle-time comparison between GPU-based FSAM and CPU-based DMASF:Hand-shake simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.8 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:Sand-pile simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.9 Idle-time comparison between GPU-based FSAM and CPU-based DMASF:Sand-pile simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
List of Tables
Table Page
3.1 Map-reduce solution for word-count as it occurs in [2] . .. . . . . . . . . . 143.2 Framework Supplied Code Classes . . . . . . . . . . . . . . . . . . . . . .. 173.3 Classes for which code is supplied by user . . . . . . . . . . . . . .. . . . . 183.4 ALGORITHM : Greedy Agent-Redistribution . . . . . . . . . . . . . .. . . 243.5 ALGORITHM : Agent-Allocation . . . . . . . . . . . . . . . . . . . . . . .273.6 API provided to handle cached results . . . . . . . . . . . . . . . .. . . . . 27
4.1 ALGORITHM : Agent-distribution on GPUs . . . . . . . . . . . . . . .. . 48
xiii
Chapter 1
Introduction
In the recent decades, multi-agent systems has emerged as animportant research field. It
has found several important applications in areas of distributed problem solving, in robotics
and in simulations and construction of synthetic worlds. Inthe area of distributed problem
solving, the multi-agent technology has helped in developing robust systems and novel soft-
ware architectures by bringing in modularity and flexible decoupling of the components. In
robotics, any complex task can be carried out by dividing into simpler tasks and allocating
each robot (which represent an agent in the system) specific goals.
Another important contribution of multi-agent systems comes in simulations. Simulations
are widely used to enhance knowledge in biology, in social sciences and several other fields
through testing of developed theories. Simulations help increating and modeling virtual
cities and humans. One can then simulate disasters and then test several rescue strategies
- an example would be RoboCup Rescue Competition. Another classic example would be
simulating traffic of a city with millions of humans and thousands of vehicles like trains,
buses. Such experiments will then enable us to construct structural roads and plan out an
effective traffic system. Analyzing emergent behaviors in social simulations intrigues us to
run the theories for large number of agents and see how the results vary. Similarly, to get a
more realistic model of the cities, one would want to simulate a huge population with millions
of humans, thus increasing the number of human agents. Due tolarge number of agents, time
taken for such simulations becomes large.
1
A simulation cycle can be seen as one execution of all agents (a step or decision taken
by the agent). In simulations involving millions of agents,the running time for a simulation
cycle can be of several seconds or even minutes; and when run for a large number of cycles,
the total simulation time can be of several hours or days. Similarly, if we want to run multiple
experiments, we would want the simulation runs to take as less amount of time as possible.
Hence, we resort to distributed computing to achieve speed up.
To fully exploit the capabilities of any distributed system, we must be able to dynamically
balance the work load on the running systems. Further, hardware failures are not rare cases. If
some of the machines fail during the run-time, then in the common case the entire simulation
needs to be re-started. If we can somehow dynamically re-balance the work-load on the
remaining number of machines and maintain logs of the simulation progress, we can continue
the simulation from the point of failure. So, we require a fault-tolerant and failure-resilient
system to run on a large number of processors and agents. Moreover, the system should be
easily extensible so that new machines can be added dynamically to scale up the simulations.
Furthermore, in multi-agent based simulations, agents perceive environment and other
agents’ states. Based on their perceptions, they frame decisions. Hence, it is important for
an agent to get the most current information from other agents and environment. We term
the issue of getting the latest perceptions of other agents and environment as data currency
problem for an agent.
Development of multi-agent based simulation systems has seen substantial research in
recent years. Numerous studies have been made on designing new programming paradigms
(NetLogo [12]) and architecture (MASON [16]), scale-up of the simulation (ZASE [13],
DMASF [14]), and developing mechanisms for recovery to handle agent failures ([24]).
Hadoop [5] and CUDA [9] present emerging trends for distributed and parallel computing.
Hadoop provides a scalable, fault-tolerant, failure-resilient distributed computing platform
and is capable of dynamically balancing the work-load. In case of system failures or in
case of addition of new machines to the cluster, Hadoop dynamically re-balances the work
load without stopping an ongoing job. The multi-agent simulation framework developer only
2
needs to develop a layer for agent-based simulation on top ofit to inherit the afore-mentioned
advantages.
GPUs with hundreds of ALUs, several thousand registers provide a faster alternative to
CPUs. GPUs with its light-weight threads provide immense parallelism, thus speeding up
execution of instructions. Further, the GPU, available today, are general-purpose parallel pro-
cessors with support for accessible programming interfaces and industry-standard languages
such as ‘C’. nVidia Compute Unified Device Architecture (CUDA) [9] enables programmers
and developers to write software to solve complex computational problems by utilizing the
many-core parallel processing power of GPUs. Hence, an agent simulation framework de-
veloped on top of CUDA would naturally be faster than its equivalent implementation on
CPUs.
1.1 Motivation
We studied the architectures of Hadoop and CUDA in depth and identified their features
which can help in advancing the state of art for multi-agent simulation frameworks. In this
work, our main motivation is to devise agent-frameworks which efficiently (i) provide data
currency with minimum possible overhead to the execution time of the simulation (work done
on GPUs using CUDA); (ii) scale with large number of agents andallows dynamic addition
of new machines and handle system failures without affecting the ongoing simulation (work
done on Hadoop).
One of the major issues with simulations involving huge number of agents is scalability.
One such agent-application would be creating a virtual citywith ten million citizens and
simulating traffic and transport system for that city. The data for city is detailed enough to
take into account railway network, road routes and has multiple modes of transport like bus,
trains and cars running on it. Each human has some information about the roads and routes.
It has a set of movement activities which involves going fromone place to another. Execution
of such a simulation requires a lot of space to store city map data, each humans’ personal
3
knowledge about the routes and its data, information about the vehicles running in the city
and other details; and along with space a lot of computation is also required as the number of
agents is huge. Moreover, we may want to store the simulationdata being generated, which
can be huge. Similarly, if we want to run multiple experiments, we would want the simulation
runs to take as less amount of time as possible. Thus, scalability becomes a major issue in
such simulations. The main motivation behind developing simulation platform over Hadoop
is scalability. Hadoop (along with Lucene [31], see section3.4) provides scalability and speed
to the simulations run on top of it.
The framework developed on GPUs using CUDA has a very low idle time for agents for
a wide class of applications as demonstrated by the results obtained for several experiments
(refer section Results). As such, the framework provides agents the ability to perceive envi-
ronment and receive information at a high rate. Now considera call of applications in which
the environment keeps getting updated at a high rate (in the order of a few seconds) and the
agents need to keep track of the changes in the environment and need to take quick actions
accordingly. If the number of agents are large, providing quick updates to all is a challenge.
Stock-market simulation is one such example. There are millions of stake-holders in a stock
market which are modeled as agents. Environment contains the quotes list for stocks of vari-
ous firms. The list gets updated with the prices getting changed rapidly. Real-life transactions
suggest that stock traders usually try to profit from the short-term price volatility with trades
lasting as low as several seconds. Now, stake-holders wouldbe able to do a more profitable
trade if they remain constantly aware of the stock prices andmake trades at appropriate time.
When simulating ten million agents with GPUs, the average idle-time obtained for an agent
is less than a milli-second, while that obtained on CPU is of the order of tens of thousands
of seconds. It implies that while the GPU agents are almost all the time aware of the mar-
ket dynamics, the intermediate time between consecutive updates for CPU agents is of the
order of hours. Clearly, if we have stock quote prices being streamed into the system from a
real-life source, GPU based agents would have a better idea of market when compared with
CPU-based agents.
4
1.2 Contribution and Organization
The main contributions of our work include development of: (i) A scalable, dynami-
cally extensible, fast, fault-tolerant and failure-resilient agent-based simulation framework
on Hadoop; (ii) Optimization techniques for implementation of a simulation framework on
Hadoop cluster namely - caching of intermediate results, using Lucene indexing for fast
agent-data retreival and messaging, and algorithm for clustering of frequently communicat-
ing agents (for reducing inter-processor communication);(iii) A four agent-state execution
model which ensures that every agent gets the latest perceptions from the agent environment;
(iv) A fast multi-agent simulation framework which utilizes the parallel architecture of GPUs
to efficiently implement the four-state agent-execution model.
Chapter 2 presents the related research on agent-simulationframeworks. Chapter 3 presents
the architecture of simulation framework developed on Hadoop alongwith the optimization
techniques that were developed. Some experimental resultsare included. Chapter 4 presents
the four state agent execution model. Architecture of agent-simulation framework developed
on GPUs alongwith experimental results are included in thischapter. Finally, in Chapter 5,
we present some conclusions and future work that can be done to extend the functionalities
and efficiency of the agent-execution models and frameworkspresented in this work.
5
Chapter 2
Related Work
Developing tools for multi-agent simulations has always been an active area of research,
with emphasis being laid on different aspects - architecture, scalability, efficiency, fault-
tolerance and effectiveness of the system. A number of frameworks have been developed
- Swarm [17], Netlogo [12] and RePast [18] are the frameworks managing simulations on
a single CPU core; ZASE [13], DMASF [14] and MASON [16] are the ones which scale
simulations to multiple cores.
SWARM [17] and RePast [18] are widely used frameworks for studying emergent agent-
behaviors through agent-based social simulations. SWARM was developed to help users
implement agent models and conduct experiments on those models. It has two versions
available- one developed in Objective-C and other in Java. It conceptualized agent-based
simulation models as a hierarchy of “swarms”, a swarm being agroup of objects and a sched-
ule of actions that the objects execute. A primary purpose ofagent-based modelling platforms
is to control which specific actions are executed when, in simulated time. Models often use
discrete, fixed time steps but sometimes use “dynamic scheduling”, in which new actions can
be generated as the model executes, and scheduled for execution at a specific future time.
Swarm provides explicit methods for scheduling actions, both in fixed time steps and dynam-
ically.
RePast [18] is developed with an objective to provide an equivalent functionality of SWARM
in Java and has been designed to make it easier for inexperienced users to build and test agent
6
models. Repast Simphony (Repast S) is the latest version of Repast. The Repast S agent
model designer is being developed to allow users to visuallyspecify the logical structure of
their models, the spatial (e.g., geographic maps and networks) structure of their models, the
kinds of agents in their models, and the behaviors of the agents themselves. Users can then
execute model runs as well as visualize and store results. Inaddition, the Repast S runtime
environment includes automated results analysis and connections to a variety of spreadsheet,
visualization, data mining, and statistical analysis tools.
Though SWARM and RePast provide a lot of functionalities, theyboth lack the capability
to manage more than one system. As such, the number of agents that can be executed using
them are limited to the configuration of the machine. Hence, the two frameworks are not
scalable.
Netlogo [12] introduced a novel programming model for implementing agent-based sim-
ulations, which eased the development of complex agent-models and scenarios. It manages
all the agents in asingle threadof execution, switching execution between different agents,
deterministically and not randomly, after each agent has done some minimal amount of work
(simulated parallelism). The simulated parallelism provides deterministic reproducibility of
the agent-based simulation every time it is run with same seed for random number generator;
which was one of the implementation goals of Netlogo. Further, a visualization module pro-
vides 2D/3D visuals of the ongoing simulation. However, Netlogo is not able to distribute the
computation on a cluster of computers and hence is not scalable.
MASON [16], developed in Java, provides a platform for running massive simulations
over a cluster of computers. It exploits several features ofJava: its platform-independence to
make the framework portable; strict math and type definitions to obtain duplicatable results;
its efficient object serialization for check-pointing simulations. It has a layered architecture
with separate layers for agent modelling and visualization, which makes decoupling the visu-
alization part easier. It has the capability to support millions of agents (without visualization).
Checkpoints of agent data are created on disk for offline visualization. Platform independence
7
and check-pointing together allow to migrate simulations from one platform to another in the
middle of the run.
ZASE [13] (developed in Java) is another scalable platform for running billions of agents.
It divides the simulation into several smaller agent-runtimes; each runtime controlling hun-
dreds of thousands of agents and running on a separate machine. A thread-pool architecture
is followed with several agents sharing a single thread of execution. It keeps all agents in
main memory without the need to access the disk. As such, scaling simulations require either
increasing the main memory or adding new processors altogether both of which are more
expensive than adding secondary storage.
DMASF [14] (developed in Python) has an architecture similar to MASON and ZASE.
Like ZASE, it divides simulation into several smaller runtimes executing on different com-
puters; furthermore several agents are executed in a sharedsingle thread of execution. How-
ever, it uses MySQL database for providing scalability withthe help of secondary storage
rather than getting bounded by the limited main memory. Similar to MASON, it has a modu-
lar architecture. It separates agent-modelling from visualization and makes the decoupling of
the two easier. Further, it dynamically balances the agent execution load among the machines
running the simulation.
However the ability to handle hardware failures is lacking in all the three (MASON, ZASE,
DMASF). If some of the systems using these frameworks fail during the simulation run, then
the simulation needs to be restarted from the beginning.
Further, there has been some work in developing agent simulation frameworks on GPUs.
In [22], the authors present an architecture for a 3D simulation framework,ABGPU - simu-
lated upto 65,536 agents at 60 Frames/sec (with visualization) and 1 million agents at 5 Fps
(without visualization). The parallel nature of processing offers significant and scalable per-
formance increases, with the added benefit of avoiding data transfer between the simulation
and rendering stages.
In [23], authors use GPU’s texture memory to store information about the agents and
environment, providing mechanisms to simulate births and deaths of agents. State update
8
functions that operate on the agent state data are programmed askernels[11]. Therefore,
different update functions will have differentkernels. Kernelsoperate one at a time on the
entire data set. Thousands of individual threads are automatically launched with each thread
executing the samekernelon a portion of the agent-state array. The threads are then barrier
synchronized at the end of the computation. Depending on theagent model, severalkernels
may be invoked for the computation of one time step.Kernelsare implemented as shaders
and data is stored as textures.
In [25], authors scale the number of agents to one billion. They provide a latency hiding
scheme to avoid communication delays which may occur due to messaging between agents
residing on physically different systems. Multiple instances of the same agent are run on
systems which have agents communicating with it. In their work, they assume agents to be
organized in a grid and agents can interact only with agents which are within some specific
distance of reach in their neighborhood region. For the distributed environment, the global
grid gets partitioned into multiple grids. Hence, the agents lying near the border may still have
to communicate with neighbouring agents on different systems. Such agents have multiple
instances running on the candidate systems. This results inlower communication delays.
Though their scheme works well only for agents having staticcommunication links, it would
be difficult to utilize this scheme for a broader class of simulations which have fairly dynamic
communication links (an example, [26]) between agents.
[30] is a recent work done on developing an environment whichenables creation of agent-
based models to be run on high performance computers (HPCs) and GPUs. The simulation
code is generated by processing a model definition of agents using a template engine for a
given set of predefined code templates and flags indicating the architecture. The generated
code (including common routine for input/output) can then be compiled and executed on the
appropriate architecture. When simulating on HPCs, the framework exploits task parallelism
and when simulating on GPUs data parallelism is utilized. However, the syntax for model and
behaviour scripting remains the same allowing models to be compiled for either architecture.
The experiments performed involved millions of agents to simulate European economy.
9
A.x
A.y
B.x
B.y
yy
1
2
3
(a) AB − dependency Graph (b) Time delays
t=t t=t
At t=t , updated A.x visible to B.
At t=t , updated A.x written to the disk.
At t=t , A.x updated in main memory.
1 2 t=t3
1 22 3
t=0
= t − t ;1 = t − t ;2
Figure 2.1AB-Example
2.1 AB-Example
Let us consider a scenario with two agents -A andB, each having two properties‘x’ and
‘y’ . Figure 2.1(a) shows the dependency graph for the two agents. AgentB considers value of
A.xbefore updatingB.x, and agentA considers value ofB.ybefore it updatesA.y. Our motive
is to provide agentB with the latest value ofA.xbefore it updatesB.xand similarly for agent
A.
AssumeA andB have a shared disk-resident database (or across network) tostore their
properties. There is some delay,λ1, involved in writing updatedA.x value to the database.
Further, there is some delay,λ2, in readingA.x from the database by agentB. Therefore,
eitherB has to delay the updation ofB.x by (λ1 + λ2) in order to get the latest value forA.x
or else it can still proceed with a relatively older value ofA.x, thus avoiding the delay. In our
agent-execution model,B always delays its decision-making to get the most recentA.x and
hence our aim is to reduce this delay as much as possible.
Multi-agent simulation frameworks like Netlogo [12], RePast [18] execute agents serially
on one machine followingcycle-based simulation model[21] - one time execution of all
the agents correspond to onecycleof the simulation; in this way the simulation proceeds in
cycles of agent execution. FollowingAB-dependency graph to ensure data currency, it would
take two simulationcyclesto complete the execution: (i) in the first cycle,A updatesA.x
andB updatesB.y. (ii) in the second cycle, agentA readsB.y to updateA.y andB readsA.x
to updateB.x. Since all changes occur in main memory, (λ1 + λ2) is very low. However,
10
when the number of agents are large (tens of millions), serial execution can get relatively
slow as compared to an equivalent parallel execution, making the total time taken to run the
simulation large.
Frameworks like DMASF [14], MASON [16] or ZASE [13], distribute computation over
a set of available processors. Agent execution is done by switching agents among a limited
number of parallel processes which may even be running on different machines. For such
scenarios, agents either need to flush data to the disk or communicate through messages over
network. In either case, (λ1 + λ2) is still high even though simulation time may go down due
to parallel agent execution.
DMASF [14] usescycle based simulationmodel [21] and agents are executed in a limited
number of parallel threads. The simulation enters into nextcycle after all agents have updated
their states for the current cycle. DMASF maintains agent states in a database. In a particular
cycle, agents make their perceptions using this database, update their states and the updated
agent state gets flushed into the database. Considering theAB-example, agentBneeds to make
sure that agentA has updated the value ofA.x in the database beforeB can read it. Therefore,
in order to solve data currency problem, the second cycle of simulation should begin only
after all the database-writes corresponding to the first cycle have been completed. Hence the
agents need to wait till all the writes have been performed. Further, accessing database to read
and write values incurs a lot of overhead in the execution time of each cycle. MASON [16]
suffers from a similar overhead - agents residing on different systems communicate through
messages and exchange data - therefore, network I/O adversely impacts the execution time.
On the other hand, GPUs have hundreds of cores to do computations at a fast pace. CUDA
[9] enables creation and management of a large number (upto several millions) of GPU
threads. CUDA APIs allow access to a sharedglobal memory[11] and a fast access to the
mapped main memory[11] for sharing agent data. Thus, the delay(λ1 + λ2) becomes neg-
ligible along with simulation time. So in our work on GPUs, weprovide an agent execution
model to achieve fast data currency.
11
Chapter 3
A Multi-agent Simulation Framework on Small Hadoop
Clusters
In this chapter, we present a design of an agent-based simulation framework implemented
on Hadoop cloud. Being developed on top of Hadoop, it inheritsHadoop’s afore-mentioned
advantages.
Our proposed framework developed on Hadoop provides three major advances to the cur-
rent state of art - (i) Dynamic addition of new computing nodes while the simulation is run-
ning; (ii) Handling node failures without affecting the ongoing simulation by redistribut-
ing the failed tasks on working systems; (iii) Allowing simulations to run on machines run-
ning different operating systems. Further, the framework incorporates several optimization
techniques - (i) clustering of frequently communicating agents (for reducing inter-processor
communication); (ii) caching of results (for improving performance) that are run on Hadoop
cloud; (iii) indexing agent-data and agent-messages usingLucene [31] for faster retreival dur-
ing the simulation. With the help of Lucene indexing, the execution time of the simulation is
comparatively less than the current day simulation frameworks.
12
3.1 Hadoop Architecture and Map-Reduce Model
Hadoop is an Apache project which develops open-source software for reliable and scal-
able distributed computing. It maintains a distributed filesystem, Hadoop Distributed File
System (HDFS) for data storage and processing. Hadoop uses classicMap-Reduceprogram-
ming paradigm to process data. This paradigm easily fits a large number of problems [8].
Hadoop consists of a single master system (known asnamenode) along with several slave
systems (known asdatanodes). For failure resilience purposes, it has asecondary namenode
which replicates the data ofnamenodeat regular intervals.
3.1.1 Hadoop Distributed File System (HDFS)
HDFS is a block-structured file system: individual files are broken into blocks of a fixed
size (default size is 64MB), which are distributed across a cluster of one or more machines
(datanodes); thus all the blocks of a single file may not be stored on the same machine.
Thus, access to a file may require access to multiple machines, in which case a file could be
rendered unavailable by the loss of any one of those machines. HDFS solves this problem
by replicating each block across a number of machines (three, by default). The metadata
information consists of division of the files into blocks andthe distribution of these blocks on
differentdatanodes. This metadata information is stored onnamenode.
3.1.2 Map-Reduce Paradigm
TheMapReduceparadigm transforms a list of(key, value)pairs into a list of values. The
transformation is done using two functions:Map andReduce. Map function takes an input
(key1, value1)pair and produces a set of intermediate(key2, value2)pairs. TheMap output
can have multiple entries with the samekey2. The MapReduceframework sorts theMap
output according to intermediatekey2and groups together all intermediatevalue2s associated
with the same intermediatekey2. TheReducefunction accepts an intermediatekey2and the
set of correspondingvalue2s for thatkey2, and produces one or more outputvalue3s.
13
(i) map(key1,value1) -> list<(key2,value2)>
(ii) reduce(key2, list<value2>) -> list<value3>
The intermediate values are supplied to theReducefunction via an iterator. This allows
handling lists of values that are too large to fit in memory. The MapReduceframework calls
theReducefunction once for each uniquekeyin sorted order. Due to this the final output list
generated by the framework is sorted according to thekeyof Reducefunction.
For example, consider the standard problem of counting the number of occurrences of each
word in a large collection of documents [2]. This problem canbe solved by using theMap
andReducefunction. TheMap function emits each word along with an associated count of
occurrences (just ‘1’ in this simple example). TheReducefunction sums together all counts
emitted for a particular word.
map(String key, String value):// key: document name// value: document contentsfor each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):// key: a word// values: a list of countsint result = 0;for each v in values:
result += ParseInt(v);Emit(AsString(result));
Table 3.1Map-reduce solution for word-count as it occurs in [2]
3.1.3 MapReduce Job Execution by Hadoop
In Hadoop terminology, by ‘job’ we mean execution of a Mapperand Reducer across a data
set and by ‘task’ we mean an execution of aMapperor aReduceron a slice of data. Hadoop
14
distributes theMap invocations across multiple machines by automatically partitioning the
input data into a set ofM independent splits. These input splits are processed in parallel
by M Mapper tasks on different machines. Hadoop invokes aRecordReadermethod on the
input split to read one record per line until the entire inputsplit has been consumed. Each
invocation of theRecordReaderleads to another call to theMap function of theMapper.
Reduceinvocations are distributed by partitioning the intermediate key space intoR pieces
using a partitioning function. One more thing to mention here is that some problems require
a series ofMapReducesteps to accomplish their goals.
Map1 -> Reduce1 -> Map2 -> Reduce2 -> Map3... and so on
Hadoop supports this by allowing us to chainMapReducejobs by writing multiple driver
methods, one for each job, usingChainMapperclasses (see Section 3.3.2.2).
Thenamenodeis responsible for scheduling variousMapReducetasks on differentdatan-
odes. First theMap tasks are scheduled and then theReducetasks. It gets periodic updates
from the datanodesabout the work-load on each. It computes the average time taken by
MapReducetasks on eachdatanodeand then does the distribution of tasks in a way that the
fasterdatanodesget more number of tasks to execute and the slower ones get less number of
tasks to execute.
3.2 Multi-agent Simulation Framework on Hadoop
Each agent has three essential properties: a uniqueidentifier, a time-stampassociated with
the state of that agent andtypeof agent. The identifiers and time-stamps (current time) are
generated by the framework itself, whereas agent-type is provided by the user. User can also
specify additional properties. State of an agent at a particular timestamp refers to the set of
its property-values at that timestamp. Likewise, an updatein the state of an agent refers to
changes in these property-values. User needs to provide anupdate-agent-statefunction for
each type of agent, which is a programmatic model for incorporating agent strategy.
15
Hadoop requires problems to be solved usingMapReducemodel. This constraint is re-
quired in order to make the scheduling ofMapReducetasks (done by Hadoop) independent
of the problem being solved. Further, by default Hadoop invokes oneMap task for each input
file. Therefore, we model a multi-agent simulation as a series of MapReducejobs with each
job representing one iteration in the simulation and we model each agent as a separate input
file to these jobs. This leads a particular job to invoke multipleMap tasks, one for each agent,
executing in parallel. And therefore, the function,update-agent-state, which is responsible
for updating the state of an agent in each iteration, is written asMap task. Reducetasks are
responsible for writing the data back into the file associated with each agent.
Each agent-state is modelled as a separate flat file on HDFS with file name being the agent
identifier. This agent file containst most recent states of the agent, wheret is a user speci-
fied parameter (specified throughgetIterationParameters()in CreateEnvironclass described
later), each state is distinguished by a timestamp. This agent file is the input for map task
initiated to update the state of an agent (one map task for each input file/agent). Current state
of an agent is the one having most recent timestamp. A separate message file is associated
with each agent that stores recent messages received from other agents. OneMapReducetask
is invoked for each agent.
Each iteration in the simulation corresponds to oneMapReducejob invoked with one
MapReducetask corresponding to every single agent. The framework implements two classes
AgentandAgentAPI. ClassAgentcontains two classes:Map andReducecorresponding to
the MapReducetask. In themapmethod of classMap, data is read from associated input
file of the agent into a Java map<object, object>, sayagentdata, mapping agent properties
to their values. Agent state is updated by execution of user-suppliedUpdatemethod (in the
AgentUserCodeclass described later).Timestampis also updated to the current time. All the
properties are concatenated as a string, and it is passed as avaluein the (key, value)tuple to
theReducer, keybeing the agent identifier. In the reduce method ofReduceclass, the input
properties-concatenated string is written in the corresponding agent flat-file.Main() func-
tion creates the simulation world with the help of classCreateEnvironfor which the code is
16
public class Agent {public static class Map extends MapReduceBaseimplements Mapper {
public void map(key, value, OutputCollector<key, value>);}public static class Reduce extends MapReduceBaseimplements Reducer {
public void reduce(key, Iterator<value>,OutputCollector<key, value>);
}public static void main(String [ ] args) {
//Create the simulation world.CreateEnviron ce = new CreateEnviron();iterparams = ce.getIterationParameters();ce.createWorld();//Configure map-reduce jobs.//Invoke map-reduce task.
}}public class AgentAPI {
public void createAgent(map <object, object> agent data);map<String,String> getAgents();void sendMessage(String from agent identifier,
String to agent identifier, String message);map<String, String> readMessages(String agent identifier);
}
Table 3.2 Framework Supplied Code Classes
supplied by user. The methodgetIterationParameters()gets the user specified inputs such as
number of iterations the simulation is intended to run and the number of most recent times-
tamps for which the agent data is retained in the agent specific flat file. MethodcreateWorld()
is also supplied by user. It creates agents and initializes the world. Further, the framework
supplies code for theAgentAPIclass. MethodcreateAgent(map<object,object>) creates a
flat file in HDFS with name as agent identifier and writes the initial state of agent. Method
getAgents()returns data of all the agents present in the simulation.sendMessage()function
sends a message to another agent. It writes the current timestamp and identifiers of the two
17
public class CreateEnviron {public map<object,object> getIterationParameters();void createWorld() {
//map <object, object> agent data initialized.AgentAPI crobj=new AgentAPI();crobj.createAgent(agent data);//And so on for any number of agents.
}}public class AgentUserCode {map<object, object> Update(map<object, object> agent data) {
switch (agent data["type"]):case "type1"://User code for update.case "type2":....
}void Shape(String agent identifier) {
switch (agent identifier):case "type1"://User code for rendering shape.case "type2":....
}}
Table 3.3Classes for which code is supplied by user
agents involved in communication in message files associated with them. readMessages()
function reads all the messages sent to the agent in the last iteration.
Finally the user needs to supply the code for strategy of different types of agents.
3.2.1 Handling Failures
System failures are a common case when number of systems involved are large. Infor-
mation about thenamenodeandsecondary namenodeis present on all thedatanodes. The
namenodesends a heart-beat message todatanodesat regular intervals (by default 600 sec-
onds; can be configured by user). Eachdatanodesends an acknowledgement message along
18
with the information regarding the status of various tasks running on it. This information in-
cludes the number of completedMapReducetasks (for the currentMapReducejob) after the
last heart-beat message received and the total time taken tocomplete them. It also includes
number of activeMapReducetasks and number ofMapReducetasks in queue, waiting to be
scheduled.
When a particularMap (or Reduce) task fails (and in cases when a slowerMap task is
becoming a bottleneck for rest of the processes), Hadoop spawns a new process to carry out
its job, and may also use idle processes to do its task (the ones which have completed their
Map/Reduce task). When one of the several processes spawned to complete the failed task
finishes, rest of them are aborted (Speculative execution). Thus, the simulation enters into
next iteration only when allMap tasks in the current iteration are completed. HenceMap
tasks is used for updating an agent’s state in a particular iteration.
3.2.1.1 Namenode Failure
If the datanodesdo not receive heartbeat message from thenamenodefor more than two
time intervals (1200 seconds), then thenamenodeis considered to have failed. Thenamenode
data has already been replicated onsecondary namenodeat regular intervals. Therefore, fail-
ure ofnamenodedoes not cause any loss of data.Datanodes(on detecting namenode failure)
at once declare thesecondary namenodeas the newnamenode. All the responsibilities of
namenodelike job scheduling are now taken up by this newnamenode. Further, thedatanode
which is physically nearest to the newnamenode(for fasternamenodedata replication) is se-
lected as the newsecondary namenode. A special case occurs when thesecondary namenode
turns out to be failed at the instant whennamenodeis detected as failed. In this case, the
simulation is aborted.
3.2.1.2 Secondary Namenode Failure
Thenamenodedetectssecondary namenodefailure if it does not receive any acknowledge-
ment for the heart-beat message. Since,secondary namenodeonly had replica ofnamenode
19
data, therefore its failure is handled simply by electing a new secondary namenodefrom the
currentdatanodes(datanodephysically nearest to thenamenodeis selected).
3.2.1.3 Datanode Failure
Namenodedetects adatanodefailure if it does not receive an acknowledgement of the
heart-beat message from thedatanode. Data of each node is replicated on three other nodes
in the distributed system. As such, when adatanodefails its data can be recovered easily
using its replica. However, it might be running severalMapReducetasks when it failed.
TheseMapReducetasks need to be rescheduled on some other datanode. Two cases arise
for a failed MapReducetask: (i) failure occurred while running theMap task; (ii) failure
occurred while running theReducetask. When thedatanodefails while running itsMap
task, the entireMapReducetask needs to be rescheduled on some other datanode and the
complete task needs to be done again. If adatanodefailure occurs while running aReduce
task, then optimally theMap task should not be redone. For achieving this, output ofMap
task is replicated (as soon as it is finished) on thosedatanodeswhich contain the data replicas
for the faileddatanode. Thus, if adatanodefails when it is runningReducetask, then only
theReducetask is rescheduled and redone on some other node.
Thus, even if some of the machines fail in the Hadoop cluster,the simulation does not
stop.
3.2.2 Dynamic Addition of New Nodes
Namenodemaintains a file containing the details of IP addresses of different machines
(datanodes). It sends heart-beat messages to the systems mentioned in this file. If a new
system needs to be added in the Hadoop cluster, then information about it simply needs to be
added in this file. When thenamenodefinds a new entry in this file, it immediately grants
access to HDFS to the newdatanodeand invokesMapReducetasks on this system, rebalanc-
ing the total work-load. Since, heart-beat messages are sent every 600 seconds, therefore the
20
newly addeddatanodecan be idle at most for this period. Further, the simulation need not be
stopped for achieving this addition.
3.3 Implementation Issues for MAS framework on Hadoop
Our model faces run-time challenges which needs to be addressed. As the number of
agents becomes large (of the order107 agents on 100 machines), overhead increases signifi-
cantly due to generation of large number ofMapReducetasks. Further, the way in which the
agents are distributed on different datanodes may increasethe execution-time. We present
below the challenges faced with above model and solutions for the same.
3.3.1 Small-Files-Problem
Every file, directory and block in HDFS is represented as an object in the namenode’s
memory, each of which occupies about150bytes. So if we have10 million agents running
then we need about3GB (assuming each file has one block) of memory for the namenode.
Furthermore, HDFS is not geared up to efficiently access small files: it is primarily designed
for streaming access of large files. Reading through small files normally causes lots of seeks
and lots of hopping from datanode to datanode to retrieve each small file, all of which is
inefficient. Moreover,Map tasks usually process a block of input at a time. If the file is very
small and there are lots of them, then each map task processesvery little input, and there are
a lot moreMap tasks running, each of which imposes extra book-keeping overhead.
3.3.1.1 Solution to the Small-Files-Problem
The overheads posed by the small-files-problem can be reduced if we group together sev-
eral agents into a single file. The problem now is to decide thenumber of agents to be put in
a single file.
21
Since our major concern is to reduce the overhead due to generation of a large number
of Map tasks, we need to first find out the number ofMap tasks that should be run on the
Hadoop cluster that would give the best expected performance. Less number ofMap tasks
would not fully exploit the available resources. And too many Map tasks would require extra
book-keeping and causing swapping of processes (JVMs) between main memory and disk.
Hence, the number ofMap tasks that would give a good average performance depends on the
amount main memory available and the computation complexities of the tasks. The memory
available to each task (JVM) can be controlled by settingmapred.child.java.optsproperty
appropriately, default is200MB. As an example, let us assume the amount of main memory
available on each node (datanodesandnamenode) to be4GB. The maximum number ofMap
tasks that can be run on adatanodewithout any swapping would then be4GB/200MB = 20.
With 10 datanodesin the cluster, the number ofMap tasks that can be run would be20 ×
10 = 200. Hence, if we need to simulate10 million agents, then we need to divide them into
200groups with50,000agents in each group and a single group written in one file. Further,
the metadata required would now be around200× 150Bytes× 2 = 600kBytes, as opposed to
3GBas computed earlier.
3.3.2 Agent Communication
Agent simulation requires communication between different agents. The agent communi-
cation occurs by fetching the state of other agent from theircorresponding agent files which
reside on different datanodes. This can be a time consuming factor in such simulations.
Hence, the number of accesses to files residing on remote datanodes need to be reduced.
Continuing the solution presented in Section 3.3.1.1, the reduction of inter-node accesses
is achieved by placing the agents which communicate with each other frequently in the same
group and hence in the same file. Furthermore, theblock-size, which is by default64MB, is
set asnumAgents× szAgentData, wherenumAgentsis the number of agents to be put in a
22
single file andszAgentDatais the maximum size of an agent’s data. The fix in theblock-size
is done to avoid placing the blocks belonging to the same file on different nodes.
We use clustering for grouping the agents. Further, we need to cluster while the simulation
is running. Hence, we need an algorithm that does not incur too much overhead on execution
time of the simulation. The following algorithm achieves the above requirements. Further,
this greedy algorithm can itself be distributed via Map/Reduce.
3.3.2.1 Agent Clustering algorithm based on agent-communication
Given K sites, agentsa1, a2, ..., aN , and communication statistics between these agents,
the problem is to form groups of agents such a manner that communication between agents
in the same group is maximized and that between agents in different groups is minimized. By
achieving low inter-group messaging, we reduce the inter-node communication. The com-
plexity of solution needs to be of the order of number of communication links among the
agents.
Rm(ak) denotes the map value for agentak in file m. The algorithm begins by placing
the agents evenly distributed intoK files on given sites. Their communication links with
each other for the first iteration is noted and based on these links they are grouped together.
Step 2 brings together the agents who communicated with eachother in the same group.
To avoid redundancy, if two agents communicated with each other, then it is considered as
the agent having the lower agent identifier communicated with the other agent and not the
other way round (Step 2(b)); and we refer this agent having lower identifier as representative
agent. Step 3 combines the resulting groups formed in different files in Step 2 and merges
them using the criteria given in the algorithm. This step is required, because same agentai
may communicate with agents in several different files and hence having different values for
Rj(ai). Next, consider a case when agentai is occurring in two filesmandn. In file m, ai is a
representative agent and havingRm(ai) = ai. In file n, it is not a representative agent and has
Rn(ai) lower thanai. In such a case,R(ai) = Rn(ai). All the elements which initially mapped
to ai on filemhave to be re-mapped toRn(ai). This justifies Step 4 of the above algorithm.
23
1. Distribute the N agents randomly into K files,and carry out one iteration of the simulation.
2. For every agent ai, in each file j,(i) Compute the list of agents with which ai communicated
using the message file associated with ai.Call ai as the representative agent.
(ii) Map each agent ak in ai’s list to ai only ifid-value(ai) <= id-value(ak); and map agent ai to itself.Denote this map table as Rj.
3. Combine map-tables Rj’s from different files into a singlemap table R using the following update rule :
if (Rm(ak) < Rn(ak))then, R(ak)=Rm(ak)
else,R(ak)=Rn(ak)
4. Let R(ak) = al. Then, do R(ak) = R(al). Do this updatefor all the entries in table-map R.
5. All agents ai having the same value for R(ai) form the samegroup.
Table 3.4ALGORITHM : Greedy Agent-Redistribution
As an example, let the distribution of agents and communication links between them be as
shown in Figure 3.1.
a1
a3
a2
a6
a7
a5
a4
Site 3Site 2Site 1
Figure 3.1Agent distribution and communication
Step 2 (corresponding to the algorithm) :R1(a1) = a1; R1(a3) = a3; R1(a7) = a1; R2(a2)
= a1; R2(a6) = a3; R3(a6) = a5; R3(a4) = a3; R3(a5) = a4
Step 3 :R(a1) = a1; R(a3) = a3; R(a7) = a1; R(a2) = a1; R(a6) = a3; R(a4) = a3; R(a5)
= a4
24
Step 4 :R(a1) = a1; R(a3) = a3; R(a7) = a1; R(a2) = a1; R(a6) = a3; R(a4) = a3; R(a5)
= a3
Step 5:-
Group 1 :a1, a2, a7
Group 2 :a3, a4, a5, a6 So finally the agents communicating with each other are grouped
together. In the presented example it is the best solution. However, the algorithm may not
always give the best solution, but for the example scenariostested it gave reasonable results
with O(M/K) complexity, whereM is the number of unique communication links between
different agents andK is the number of sites.
3.3.2.2 Implementing Greedy Agent-Redistribution Algorithm
Since we already have a Hadoop cloud setup, we use this cloud to run the above algorithm
and compute the clusters. So, we reframed the algorithm intoa Chained MapReducemodel
and executed it on Hadoop. Execution is carried out in two chainedMapReducejobs. Output
of first MapReducejob becomes the input to theMap phase of secondMapReducejob. Fi-
nally, clusters of agents are obtained as output from the secondMapReducejob. Step 1 in the
above algorithm corresponds toMAP-1, Step 2 corresponds toREDUCE-1. Further, Step 3
is executed asMAP-2and Step 4 asREDUCE-2.
3.3.2.2.1 In MAP-1 phase:
Input - Agent and the list of agents, it communicated with. This is represented as a single
line of numbers :x1, x2,..., andxk, wherex1 represents the identifier of current agent under
consideration, and the following numbers are identifiers ofagents who communicated with
the current agent. The input consists of several such lines,one for each agent in the simula-
tion. The input split and load balancing is done by Hadoop itself.
Output - (Key, Value) pairs (xi, x1) if x1 <= xi.
25
3.3.2.2.2 In REDUCE-1 phase:
Input - The output fromMAP-1.
Output - Different values corresponding to the same key are brought together in a list by
Hadoop. LetV almin denote minimum of these values. Reduce this list of values to asingle
value,V almin. Therefore, output of this phase is the reduced set of(Key, Value)pairs. Further,
if a pair with same value forKeyandValueoccurs (e.g. (2,2)) then (2,V almin) is written in a
separate file (Refer this file asRepresentativeMaps.). This is useful for phase MAP-2.
3.3.2.2.3 In MAP-2 phase:
Input - (Key, Value)pairs fromREDUCE-1output and the fileRepresentativeMaps.
Output - For eachKey, the correspondingValueis mapped toR(Value)using theRepresen-
tative Maps, which contains(Value, R(Value))pairs. Finally, the pair is reversed. Therefore,
the output of the phase is(R(Value), Key).
3.3.2.2.4 In REDUCE-2 phase:
Input - (Key, Value)pairs fromMAP-2.
Output - Values corresponding to same key are grouped together in a list by Hadoop itself.
Therefore the final output of the phase is(Key, List of Values). e.g. : If (k1, v1), (k1, v2), (k1,
v3) were present in the output ofMAP-2. Then the corresponding output pair will be(k1, [v1,
v2, v3]) .
These clusters/groups contains agents who communicate with each other.
3.3.2.3 Placing the agent-groups into Files
The total number of files to be formed is determined using the concepts mentioned in
Section 3.3.1.1. In our experiments, we fixed it as10×K, whereK is the number of computing
sites. The clusters obtained are re-arranged into10×K files, such that all agents belonging to
the same cluster occur together in same file as far as possible. We follow the procedure stated
in Table 3.5.
26
file = 1for each cluster c
if (size(c) + allocation(file) < capacity(file))then,
Place all agents in cluster c into this file.else,
Place into this file maximum possible numberof agents in cluster c.file = file + 1Reduce cluster c to unallocated agents.Repeat the loop for this c.
Update allocation(file) appropriately.
Table 3.5ALGORITHM : Agent-Allocation
where,size(c)denote the size of clusterc, allocation(file) gives the current number of
agents placed into thisfile, andcapacity(file)gives the maximum number of agents which
this file can hold.
3.3.3 Queries in Agent-State Updates
public class DistributedCacheUtils {public void createCache(String chacheFileName,
List<String> records, int iterationNumber);List<String> readFromCache(String chacheFileName,
int iterationNumber);List<String> deleteFromCache(String chacheFileName,
int iterationNumber);}
Table 3.6API provided to handle cached results
An agent, to decide its next state, needs to know about the state of other agents. For
example, in the simulation of a soccer game, an agent may wantto know about the location
of all the players at that instant; this information for thatparticular instant would be same for
all agents. Hence, the percept once computed can be broadcasted to all; in other words, once
the query for location of all agents is computed, its result can be used by other agents as well.
27
Therefore, caching results reduces execution time by avoiding redundant execution of same
queries.
Queries needs to access a large number of agent files and many of them may reside on dif-
ferent datanodes. For reducing access time, we cache results of recent queries in the HDFS.
The cached result files are physically replicated on each datanode with the help ofDistribut-
edCacheclass of Hadoop. This provides a rapid access of the cached results to the datanodes.
The framework provides an implementation (refer Table 3.6)which allows simulation de-
veloper to perform various operations on cache files. MethodcreateCache()creates a cache
file with the specified file name, writes the records provided and copies the file in theDis-
tributedCacheof the HDFS. Further, the iteration number of the simulationin which the file
was created is stored along with it. MethodreadFromCache()reads the entire cache file asso-
ciated with the iteration number passed as parameter. Lastly, deleteFromCache()allows the
developer to delete the unused cache files and clear the HDFS space from time-to-time.
3.4 Agent Execution using Lucene/Solr Indexing
A major improvement (as much as 100 times, see the section on results) in run-time is
achieved by replacing HDFS-based agent data storage with Lucene-based indexed storage.
In this section, we briefly introduce the technologies Lucene [31] and Solr [32] and the way
agent-execution is implemented using them.
3.4.1 Introduction to Lucene and Solr
Lucene [31] is an open source, highly scalable text search-engine library developed by
Apache Software Foundation. Lucene has been developed in Java and now it has been ported
to many other programming languages, including Perl, Python, C++, and .NET. Lucene’s
APIs focus mainly on text indexing and searching.
Supporting full-text search using Lucene requires two steps: (1) creating a lucence index
on the documents and/or database objects and (2) parsing theuser query and looking up
28
the prebuilt index to answer the query. Lucene uses powerful, accurate, and efficient search
algorithms to look into the index created by it and retreive the results.
The major concepts in Lucene include -documents, fieldsandqueries. The unit of search
and index is adocument. A documentis basically an object that needs to be stored in the
index. A documentconsists of one or morefields. A field is simply a name-value pair. An
index, therefore, consists of one or moredocuments, eachdocumentcontaining some pre-
definedfields. Indexing involves addingdocumentsto anIndexWriter, and searching involves
retrievingdocumentsfrom an index via anIndexSearcher. Lucene provides its ownquery
language for searching, allowing users to specify whichfields to search on, which fields to
give more weight to (boosting), the ability to perform boolean queries (AND, OR, NOT) and
other functionality.
Lucene uses a file-based locking mechanism to prevent concurrent index modifications.
Moreover, Lucene allows simultaneous searching and indexing which further enhances its
performance.
Lucene provides libraries for building and searching indexon a local machine. If instead
we want to query Lucene index residing on some remote machine, we need to a search-server
which can handle all querying requests and answers them efficiently. Solr [32] does the job.
Solr runs as a standalone full-text search server within a servlet container such as Tomcat.
Solr uses the Lucene Java search library at its core for full-text indexing and search. Like
Lucene, it allows the users to specify the schema ofdocumentsthat need to be stored in
underlying Lucene index. Further, it provides highly scalable distributed search with shared
index across multiple hosts which increases the search speed and allows for a larger number
of simultaneous querying.
3.4.2 Implementation Strategy
We use Lucene to store the agent data. Each agent is modelled as adocumentin the Lucene
index, with different agent attributes being stored asfieldsin its document. Agents are indexed
29
based on their agent identifiers. Agent identifier is configured in the index as a unique and
requiredfield of the agentdocuments. Solr based search server is set-up which listens the
incoming requests for adding agentdocumentsand querying agent data and executes them on
the underlying Lucene index.
The implementation of classes -CreateEnvironandAgentUserCode- still remain the same
as earlier. Even the APIs provided by the classesAgentAPIandAgenthave the same decla-
rations for methods, only the way they are implemented gets changed.
Each iteration in the simulation corresponds to oneMapReducejob invoked with one
MapReducetask executing a set of agents; the number of agents being executed in a sin-
gle task is decided using the concepts described in Section 3.3.1.1. ClassAgentcontains two
classes:Map andReducecorresponding to theMapReducetask. In themapmethod of class
Map, agent-data is fetched from the Lucene index through APIs provided by Solr. A single
query is formed by concatenating the identifiers with anORclause for the set of agents being
executed by that particularMapReducetask. The execution time for a single query fetching
multiple agentdocumentsis much less than multiple queries fetching one agentdocument
each. Agents are now updated using theUpdatemethod provided by the user in theAgen-
tUserCodeclass. The updated values for agentdocumentsis rewritten in the Lucene index
once theUpdatemethod has been called for all agents. Again a bulk-update API is provided
by Solr which updates several entries in the Lucene index in one request; rather than updat-
ing single entries in multiple requests. The agent-identifier and the corresponding serialized
agent properties are emitted as the(key, value)pairs on completion ofMap task. In thereduce
method of classReduce, the received(key, value)pairs are simply passed to theOutputCollec-
tor. In this manner, we have the log of states of all agents after completion of every iteration.
This log can be used later for offline visualzation and analysis.
MethodAgentAPI.createAgentcreates agents by adding agentdocumentsin the Lucene
index. MethodAgentAPI.getAgentsissues aselect-all-like query on the Lucene index.Agen-
tAPI.sendMessagecreates an entry into the Lucene index with message indentifier being a
function of iteration number and agent-identifiers of the two agents involved. Messages being
30
indexed by message identifiers built in this manner providesa fast retreival of the messages
by AgentAPI.readMessagemethod.
Replacing HDFS-based agent data storage and messaging with the fast index-based oper-
ations for updation and messaging provides a major reduction in the execution-time of the
agent-based simulation. Speed-up achieved is upto100times for some simulations.
3.5 Experimental Results
In our experiments, we took multi-agent simulation problems of diverse nature, so as to
test the overhead of execution of the two optimization algorithms - clustering of agents and
caching of intermediate results - running on top of Hadoop framework. Further, we stud-
ied the speed up provided by using Lucene index along with Hadoop. We also compared
the execution-time for several simulations with one of the existing simulation framework
DMASF [14] for varying number of agents. We set up a Hadoop cloud with 10 Unix ma-
chines, each having1.67 GHzprocessor with2 GBRAM. Our experiments involved200,000
agents distributed on these10 machines and interacting with each other for100 iterations.
Some of the important results obtained are presented here.
3.5.1 Circle Simulation
In this problem, agents are scattered randomly on a 2-D plane. Their goal is to position
themselves in a circle, and then move around it. The strategyinvolved computation of arith-
metic mean of locations of all the agents. A frequent query isexecuted to compute this mean
to get the locations of all agents. Accessing the agent files on different systems, everytime
(once in each agent update function) an agent requested, wasavoided by caching locations of
all the agents once and then using this cached value for future reference. This cache was up-
dated in every iteration. Average execution time for one iteration reduced from124seconds
to 58seconds (see Figure 3.2).
31
0 20 40 60 80 10040
50
60
70
80
90
100
110
120
130
140
150
Iteration Number (N)
Tim
e ta
ken
in N
−th
Itera
tion
in s
econ
ds
(a) Comparing Caching−On and Caching−Off for Circle Simulation
Caching−OffCaching−On
Figure 3.2Comparing iteration-time for cachingOn and Off for Circle Simulation on Hadoop
0 20 40 60 80 10010
3
104
105
106
Iteration Number (N)
# of
inte
r−no
de m
essa
ges
in N
−th
itera
tion
(b) Messaging for Clustering−On and −Off (Standing Ovation Simulation)
ClusteredUnClustered
Figure 3.3Comparing number of inter-node messages for clusteringOn and Off for StandingOvation Simulation
32
0 20 40 60 80 10020
40
60
80
100
120
140
160
Iteration Number (N)
Tim
e ta
ken
in N
−th
itera
tion
in s
econ
ds
(c) Comparing Clustering−On and Clustering−Off for Standing Ovation Simulation
ClusteredUn−clustered
Figure 3.4Comparing iteration-time for clusteringOn and Off for Standing Ovation Simula-tion
0 20 40 60 80 100
60
80
100
120
140
160
Iteration Number (N)
Tim
e ta
ken
in N
−th
Itera
tion
in s
econ
ds
(d) Comparing Clustering−On and Clustering−Off for Sand−pile simulation
Figure 3.5Comparing iteration-time for clusteringOn and Off for Sandpile simulation
33
1000 1500 2000 2500 3000 3500 4000 4500 50005
10
15
20
25
30
35
40
# of messages received and sent by each agent (P)
Tim
e ta
ken
in o
ne it
erat
ion
in s
econ
ds
(e) KP−Simulation
K=1000K=2000K=3000K=4000K=5000
Figure 3.6 Iteration times obtained for KP-Simulation
0 20 40 60 80 10090
95
100
105
110
115
120
125
130
135
140
Iteration Number (N)
Tim
e ta
ken
in N
−th
itera
tion
in s
econ
ds
(f) Removing and adding machines dynamically
Figure 3.7 Iteration times obtained when removing and adding machinesdynamically
34
1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
350
400
450
500
Number of active data−nodes
Tim
e ta
ken
for
one
itera
tion
in s
econ
ds(g) Scalability test with 20,000 agents
Figure 3.8Scalability test with 200,000 agents
1 2 3 4 5 6 7 8 9 10
50,000
100,000
150,000
200,000
50,000
100,000
150,000
200,000
50,000
100,000
Number of active data−nodes
Num
ber
of u
pdat
ed−a
gent
s in
60
seco
nds
(h) Scalability Test with varying number of data−nodes
Figure 3.9Scalability Test with varying number of datanodes
35
3.5.2 Standing Ovation Problem (SOP)
The basic SOP can be stated as: a brilliant economics lectureends and the audience begins
to applaud. The applause builds and tentatively, a few audience members may or may not
decide to stand. Does a standing ovation ensue or does the enthusiasm fizzle? The authors
present a simulation model for the above problem in [28]. We modelled the auditorium as
a 500×400grid, and agents were randomly assigned a seat. Agents in this simulation com-
municated with almost a constant set of neighbouring agents. Hence the clustering algorithm
proposed earlier showed marked reduction in number of inter-site messages and showed ma-
jor improvements in execution-time. Time for each iteration was almost halved (see Figure
3.3, 3.4). The slight increase in iteration-time during thethird iteration is due to the overhead
of clustering algorithm. It took approximately35 seconds for the clustering algorithm to run
for 200,000agents.
3.5.3 Sand-pile Simulation
In this problem, grain particles are dropped from a height into a beaker through a funnel,
and they finally settle down in the beaker after colliding with each other and with the walls
of beaker and funnel. A detailed description and solution ofthe problem is given in [26] and
for visualization of the problem refer to the link [27].
The queries generated in this problem were generic enough togive good results on caching
intermediate results and were similar to Circle Simulation and hence omitted. However, the
set of agents with which a particular agent interacted, changed too frequently. Hence, the
number of inter-node messages varied from iteration-to-iteration. The results show a 100
iteration run with regular agent-clustering done after every 30th iteration (see Figure 3.5). It
is observed that agent clustering reduces the execution time for following iterations but due to
frequent changes in communication links of agents, the affect of clustering reduces rapidly.
The sharp rise at the33rd, 63rd and 93rd iterations are due to the overhead of clustering
algorithm.
36
3.5.4 KP Simulation
This simulation is done to test messaging efficiency of the framework. K denotes the
number of agents in a particular run of the simulation.P is the number of messages sent
by each agent. Results obtained for different values ofK andP are shown in Figure 3.6.
Results indicate that internode communication incurs majorcost in execution time. As the
number of agents increase, resulting in involvement of larger number of data-nodes, the cost
for messaging increases. This is indicated by the two pairs of values(K=1000, P=5000),
time taken11 seconds and(K=5000, P=1000), time taken36 seconds for the same number
of total messages flowing in the system(5000×1000).
3.5.5 Dynamic Nodes Addition
We tested the ability of Hadoop to redistribute agents when new datanodes are dynami-
cally added to the Hadoop cluster and when some hardware failures occur. We ran sand-pile
simulation with Hadoop cloud consisting of ten datanodes. At 30th iteration, we failed two
datanodes. Finally we added one datanode at60th iteration. Results show an inverse relation
between execution-time and the number of active data-nodesshown in Figure 3.7. The aver-
age execution-time for one iteration increased from99 seconds to121seconds on reduction
of nodes from10 to 8 and then decreased again to106 seconds when9 datanodes became
active.
3.5.6 Scalability Tests
For testing the scalability of framework, we conducted two experiments. In the first ex-
periment, we ran thecircle simulationwith 200,000agents and varied the number of active
datanodes from1 to 10 and noted the average time taken for one iteration in each case. Re-
sults show run-time being inversely proportional to the number of machines (Figure 3.8).
Time taken for one iteration reduced from489seconds (on one machine) to58seconds when
executed on10 systems. In the second experiment, we ran thecircle simulationfor 60 sec-
37
101
102
103
104
105
106
10−2
10−1
100
101
102
103
Number of agents
Itera
tion−
time
in s
econ
ds
Comparing iteration−time for Hadoop−hdfs, Hadoop−lucene, DMASF: Circle Simulation
Hadoop−HDFSDMASFHadoop−Lucene
Figure 3.10Comparing iteration-times between Hadoop with HDFS, Hadoopwith Lucene,DMASF for Circle Simulation
onds and noted the number of agents updated with one machine in the Hadoop cloud. Then,
we varied the number of machines from1 to 10 and in each case noted the number of agents
being updated in60 seconds. Almost linear increase in the number of updated-agents was
observed (Figure 3.9). For one system,26,081agents were updated in60 seconds for one
system which increased to200,012when10systems were put to use in same amount of time.
3.5.7 Experiments and comparison with Lucene
We compared the execution-time of Hadoop using Lucene indexas backend for agent-data
and messaging with Hadoop using HDFS as the backend. The execution time obtained using
Lucene indexing is comparable and even faster than DMASF [14]. We used the three simu-
lations -circle, standing ovationandsand-pile- mentioned earlier for experiments. Results
obtained are presented in Figures 3.10, 3.11, 3.12. The results show that as we increase the
number of agents, Hadoop with Lucene index outperforms current state simulation frame-
work, DMASF.
38
101
102
103
104
105
106
10−2
10−1
100
101
102
103
Number of agents
Itera
tion−
time
in s
econ
ds
Comparing Iteration−time for Hadoop−hdfs, Hadoop−lucene, DMASF:Standing Ovation Simulation
Hadoop−HDFS
DMASF
Hadoop−Lucene
Figure 3.11Comparing iteration-times between Hadoop with HDFS, Hadoopwith Lucene,DMASF for Standing Ovation Simulation
101
102
103
104
105
106
10−1
100
101
102
103
Number of agents
Itera
tion−
time
in s
econ
ds
Comparing iteration−time for Hadoop−hdfs, Hadoop−lucene, DMASF: Sand−pile Simulation
Hadoop−HDFSDMASFHadoop−Lucene
Figure 3.12Comparing iteration-times between Hadoop with HDFS, Hadoopwith Lucene,DMASF for Sand-pile Simulation
39
Hadoop provides dynamic load balancing, failure-recoveryand dynamic addition of new
nodes which definitely incur processing overheads in terms of heart-beat messages and data
replication. It is a generic framework to solve diverse problems and is not specifically in-
tended for multi-agent systems. Hence, the run-time for different simulations obtained with
Hadoop are definitely not the best when compared with other simulation frameworks.
40
Chapter 4
Efficient Multi-Agent Simulation using Four State Agent
Execution Model on GPUs
In this chapter, we present an agent execution framework on GPUs to achieve fast data
currency. The main contributions of this chapter are:(i) developing an agent execution model
with four agent states -update, perceive, decide, rest- to ensure that every agent gets the
latest perceptions of the agent environment (Section 4.2);and(ii) we present a multi-agent
simulation framework usingFSAM(Four-State Agent-execution Model), utilizing GPUs as a
platform to efficiently support multi-agent applications with millions of agents (Section 4.3).
We also present some optimizations which the framework utilizes to speed-up the simula-
tion on GPUs -(i) optimal distribution of agents on a cluster of GPUs,(ii) a fast messaging
model and(iii) improvements in managing warps ([11]). These optimizations are specific
to CUDA architecture for GPUs. On running 10 million agents, the speedup achieved on
GPUs for some simulations was as high as10,000times when compared against 2 CPUs with
similar configuration (see Section 4.4.1).
4.1 Outline of nVidia Compute Unified Device Architecture
CUDA architecture consists of a scalable array of multi-threaded Streaming Multiproces-
sors (SMs). Each multiprocessor in turn consists of multiple Scalar Processor (SP) cores. A
41
multiprocessor creates, manages, and executes concurrentand light-weight GPU threads in
hardware with extremely low scheduling overhead. It implements a fast barrier synchroniza-
tion with a single instruction. These features enable a fine-grained parallelism by assigning
one thread to each data element (in present case, assigning athread to each agent) of the
problem under consideration.
CUDA architecture supports creation of a large number of GPU threads, theoretically
of the order240. These threads are organized in two- or three- dimensional thread blocks.
Threads within a block synchornize and share data through a shared memory. The threads of
a thread block execute concurrently on one multiprocessor.As thread blocks terminate, new
blocks are launched on the vacated multiprocessors. Threadblocks are further organized into
a one-dimensional or two-dimensionalgridsof thread blocks.
CUDA uses Single-Instruction Multiple-Thread (SIMT) architecture, i.e., same set of in-
structions is carried out concurrently in different threads with each thread processing on dif-
ferent data elements. Each thread has its own instruction address and register state. The set
of instructions which is executedN times in parallel byN different CUDA threads is referred
to as thekernelfunction. Each of the threads that execute a kernel is given athread identifier.
Similarly, each block in the grid is given a unique identifier. CUDA executes thread blocks
independently in any order across any number of cores, enabling programmers to write code
that scales with the number of cores.
CUDA threads may access data from multiple memory spaces during their execution. Each
thread has a privatelocal memory. Each thread block has asharedmemory (16kB in size)
visible to all threads of the block and with the same lifetimeas the block. Finally, all threads
have access to the same global memory. There are also two additional read-only memory
spaces accessible by all threads: the constant and texture memory spaces. The global, con-
stant, and texture memory spaces are optimized for different memory usages. The above
memory spaces altogether are referred to as thedevicememory spaces (as they comprise the
on-chip memory of a GPU-device).Kernelsoperates out of device memory. It cannot directly
42
Initialize
UPDATE
REST
PERCEIVE
DECIDE
End
update−state code
Create agent Make decisions. Executeagent decide−state code
Execute agent
threads
Analyze agent−univese till every agent has waited for
decision−lag amount of time
Or all agents have reached their goaluser−specified number of cyclesIf simulation has completed
Wait till all agents reach Rest state
Figure 4.1Agent state change diagram
operate on the memory space of the host CPU on which GPU devicesare mounted. However,
CUDA also enables accelerated access of thepage-lockedhost memory by the GPU device.
The number of blocks a multiprocessor can process at a time depends on number of regis-
ters required per thread and shared memory required per block for a given kernel. If either of
them are not enough to process at least one block, the kernel will fail to launch. The number
of registers required per thread (for a given kernel code), shared memory required per block
and its effect on performance can be determined using CUDA Occupancy Calculator [10]
provided by nVidia CUDA.
4.2 Agent Execution Model
Each agent is modelled as a separate GPU thread. We define fouragent states -update,
perceive, decideandrest (refer state transition diagram in Figure 4.1). The application de-
veloper needs to define what each agent does in each of these states as separate functions,
we refer to these sub-routines as agent-state codes. As soonas the agent threads are created,
agents enter into theupdatestate and agents execute theirupdatecode, on completion of
which they enter into theperceivestate. In theperceivestate, every agent monitors the events
43
occuring in the agent environment and obtains information about the states of other agents.
During perceivestate, an agent executes itsperceivecode multiple times in order to get the
latest perception of the agent environment and determine best possible decisions. We ensure
that an agent is scheduled at least twice forperceivecode execution (see Section 4.2.1 and
4.2.2 for details).
The decisions framed duringperceivestate get executed once the agent enters intodecide
state. After executing thedecidecode, an agent immediately enters into therest state. An
agent remains in thereststate till every agent in the simulation has executed itsdecidecode.
As soon as every agent reachesreststate (on completion of thedecidestate), agents enter into
updatestate. This marks the beginning of the nextcycleof the simulation. Thus, simulation
is said to complete onecyclewhen every agent completes one set of the state transitions,
starting fromupdatestate to thereststate.
We have not yet defined the transition fromperceivestate todecidestate. For this we
introduce two parameters -decision lagǫ, andidle-time. An agent in the simulation stays in
perceivestate till every agent has spent at leastǫ time (decision-lag) in theirperceivestate. In
our experiments, because of parallel threading provided byGPUs, the value ofǫ obtained is
very small.Idle-timeis the duration for which an agent thread is waiting to get scheduled for
execution on the processor.
4.2.1 Evaluating ‘decision-lag’
Value of decision-lag, ǫ, needs to be chosen carefully. Consider two groups of agents -
G1 andG2, with agents in the same group being scheduled concurrently. Let agent threads
corresponding to groupG1 be scheduled earlier thanG2. G2 agent-threads will complete their
updatecode execution later thanG1. Now, we need to ensure thatG1 agents execute their
perceivecode at least once after the completion ofG2 agents’updatecode execution in order
to get the latest states of theG2 agents. Further,G2 agents in theirperceivestate may send a
message toG1 agents, soG1 agents should execute theirperceivecode at least once afterG2
44
agents have executed theirperceivecode. If we ensure thatG2 agent-threads get scheduled
at least twice forperceivecode execution, then we can be sure thatG1 agent-threads get
scheduled at least once forperceivecode execution after completion ofG2 agentsperceive.
Thus, we need to make sure thatG2 agent-threads stay in theperceivestate for timeǫ greater
than the sum ofidle-timeand time taken forperceivecode execution.
Further,ǫ will vary for different simulations depending on the time taken by the agent-
state codes. For different number of agents, theidle-timewill increase with the increase in
number of agents, hence affecting the value ofǫ. Hence, a good estimate ofǫ is obtained by
executing onecycleof the simulation keepingǫ to be 0, i.e., as soon as every agent finishes
theupdatestate, agents execute theirperceivecode once and enter into thedecidestate. The
total time taken for onecycleis noted. In this time, every agent has been scheduled at least
thrice for execution, once corresponding to each -update, perceive, anddecidestate. Hence,
the total time would be a good choice fordecision-lag,ǫ. So, we assign this value toǫ and
restart the simulation from the beginning. Now during the course of simulation it may happen
that agents change their execution path, i.e., an agent may face different scenarios and hence
execute a different strategy which may take longer to execute. Accordingly, value ofǫ needs
to be adjusted. Therefore,decision-lagfor (i+1) th cycle, ǫi+1, is calculated as,
ǫi+1 = maximum(ǫ′, ǫavg), where
ǫavg =∑i
k=1
ǫki,
ǫ′ = Ti(u+ p+ d),
Ti(u+ p+ d) = Time taken for update, perceive and decide code execution in ith cycle.
We found experimentally that the value ofǫ converges after a few iterations. In our ex-
periments with106 agents,ǫ converged to31.439msfor Circle simulationand to14.125ms
for Hand-shake simulationafter secondcycleitself (refer Section 4.4.1). ForSand-pilesim-
ulation,ǫ started off with a high value of105.253msdue to numerous calculations in the first
cycle, but gradually converged to80.380msafter tenthcycle.
45
4.2.2 Utility of ‘perceive’ state
With our agent-execution model, the simulation developer needs to know the agent-activities
which are dependent on other agents and those which are independent. Independent activities
form the code corresponding to theupdatestate. Dependent activities get broken into two
parts - first, analyzing the agents (or the environment) on which the activity is dependent,
theperceivecode; second, taking the decisions on basis of the perceptions and messages re-
ceived, thedecidecode. In theAB-example, agentA will update the value ofA.x, andB will
update the value ofB.y in their updatecode. In theperceivestate,A will read B.y, andB will
readA.x. Finally, indecidestate,A will update the value ofA.yandB will updateB.x.
The presence ofperceivestate is important. Looking at theG1G2-example, let agentA
be a part ofG1 andB be a part ofG2. Now, B is scheduled afterA for execution ofupdate
code. Now,A may execute itsperceivestate in parallel toB’s updatestate execution. In order
to ensure thatA perceives the latest value ofB.y, A should execute itsperceivecode at least
once afterB’s updatecode (as enabled by‘decision-lag’) for its decision making. Whereas,
absence ofperceivestate might have ledA to read an older value ofB.y. Therefore, in our
modelA is always made to execute theperceivestate.
4.3 FSAM-framework Architecture
We implementedFSAM-framework for testing our execution model. The framework con-
sists of GPUs mounted on the same host CPU. There are three corecomponents: amaster-
controller, agent-controllersanduser-interaction(Figure 4.2). Each agent is modelled as a
separate CUDA thread on the GPU.
Master-controller : This component is responsible for managing the over-all execution
of the simulation. It initiates the simulation and distributes the agents on availableagent-
controllers. It keeps track whether theagent-controllersare working or have failed and redis-
tributes the agents in case of system failures.
46
��������
��������
��������
��������
��������
��������
ControllerMaster
Visualization and User−interaction
GPU−n
GPU−2
GPU−1
Local MessageController
GPU−basedAgent−Controllers
Mapped Memory
Agents
Figure 4.2FSAM-framework Architecture
Agent-controller : This component is responsible for running the agents allotted to it.
Every agent allotted to anagent-controllerruns in a separate CUDA thread.
User-interaction : This component gives the visual representation of the running simula-
tion. It is essentially the same physical system as themaster-controller. Visualizeraccesses
the agent-environment data on themaster-controllerand renders the visualization using it.
The simulation begins by invoking a query to know the number of GPU devices (agent-
controllers) that are hosted on themaster-controller(CPU). Then,master-controllerdoes the
initial distribution of the agents on theagent-controllersand the simulation starts with all
agents entering into theirupdatestate.
4.3.1 Distribution of agents
GivenN agents andG GPUs, the agent distribution algorithm is stated in Table 4.3.1. The
computeconcurrency()function determines the number of concurrent threads that can run
on eachagent-controllerusing CUDA Occupancy Calculator [10]. Theagent-controllersare
sorted, in descending order of their number of concurrent threads. Next, theagent-controller
that can run the maximum number of concurrent threads is allotted ‘n’ agents to it where
47
1. for i = 1 to G :C[i].concurrency = compute concurrency();C[i].id = i;
2. Sort C in descending order according to C[i].concurrencyvalue.
3. s = 0; i = 0; Num[1...G] = 0;4. while (s <= N && i < G) :
Num[C[i].id] = C[i].concurrency;i += 1; s += C[i].concurrency;
5. if (s < N) :for i = 1 to G :
Num[C[i].id] += (N-s) * C[i].concurrency/s;6. Num[i] is the number of agents to be allotted to ith GPU.
Table 4.1ALGORITHM : Agent-distribution on GPUs
‘n’ is the number of concurrent threads running on this system. Then, theagent-controller
having second maximum number gets a similar allotment for it. This process is continued till
all the agents get allotted. The total number of agents can bemore than the total number of
concurrent threads. In such a case, we first allot agents to each of theagent-controllersas
mentioned above, and then again distribute remaining unallotted agents in proportion to the
number of concurrent threads running on those systems.
In our experiments, we found that with this distribution no participating GPU is over-
loaded and none of them are idle for large number of agents.
4.3.2 Event-driven approach of agent-state updates
The application developer needs to provide the implementation code for the agent-environment
initialization, and theupdate, perceiveand decidecodes. The agent state codes are exe-
cuted askernel([11]) functions on theagent-controllers. CUDA provides event generation
APIs which allows us to raise events when a particularkernelfinishes its execution. Using
these APIs, events are generated on complete execution ofkernelfunctions which denote the
change in state of the agents, andmaster-controllergets notified about these events.Per-
ceivestate for an agent gets initiated as soon asupdatestate is complete. Switching from
48
perceivestate todecidestate for an agent requires all agents to complete theirperceivestate.
Therefore,master-controllerwaits for notifications from all theagent-controllersabout the
completion ofperceivestate, after which it initiates the execution fordecidestate. An agent
enters intorest state as soon asdecidestate is completed. Transition fromrest state toup-
datestate occurs in a way similar toperceive-decidestate transition. State changes occur
according to protocols described in Section 4.2.
An agent on anagent-controlleris considered to have failed if theagent-controllerdoes
not receive notification for state transition for10*ǫi seconds, whereǫi is the decision-lag
computed forith cycle. In such a case, thekernelexecuting that particular agent is re-launched
for execution.
4.3.3 Messaging
In order to ensure security of agent-data, no agent can handle other agents’ data directly. If
an agent wants to send a message to another agent or want to seeor change the data of other
agents, then it needs to send a request to themaster-controllerwhich decides whether to ac-
cept or deny the request. In order to speed-up the data transfers, agents running on GPU-based
agent-controllersshare memory with themaster-controllerusing the mapped memory APIs
provided by CUDA. Now the application developer needs to implement a function which
takes two agent identifiers as parameters,agentfrom id andagentto id, and returnstrue if
agentfrom id is allowed to accessagentto id’s data, otherwise it returnsfalse. With this
setting, whenever a request is initiated by an agent, themaster-controllersimply invokes this
function and accordingly returns the pointer to the data if return value wastrue, else returns
null.
Master-controllerbecomes a bottle-neck for such requests when the number of agents
become large. To lessen the burden, we have alocal-message controlleron each GPU. Each
GPU maintains a local copy of data of the agents running on it in addition to the CPU-shared
memory. If the two agents involved in messaging are on the same agent-controller, then the
49
local message controllerhandles the request using the local agent-data; otherwise the request
goes to themaster-controllerwhich then handles the request usingmapped memory.
With this messaging model, framework is able to manage up to1014 messages in approxi-
mately100 milli-seconds.
4.3.4 Warp management
Warpsare group of 32 threads which are initiated, scheduled and terminated together.
Performance is best when all threads in awarp follow the same execution path. Threads may
diverge due to conditional statements in the code. For example let there be two threads,ta
and tb. ta satisfies the conditioncondwhereastb satisfies theelsepart of cond. In CUDA,
threadtb gets blocked whileta is executing theif part; likewise threadta stays idle whentb
is executingelsepart. Hence, there should be as little divergence as possible in the execution
path of threads in the samewarp. In order to reduce the divergence in the execution paths, we
allot threads in the sameblock to sametypeof agents as their behavior would be similar to
each other and they would be less likely to diverge. Experiments showed as much as 8-fold
improvement in execution time (Section 4.4.2.2).
4.4 Experimental Results
We conducted several experiments for testing the utility ofour agent-execution model and
the performance ofFSAM-framework. For experiments we used nVidia Tesla T10 GPU with
933 GFLOPS of processing performance, 1.30 GHz clock-rate and 4 GB of GDDR3 memory
at 102 GB/s bandwidth. It has 30 multi-processors with 240 cores, a constant memory of 64
MB, a shared memory of 16kB per block and 16k registers per block. For all the experiments,
we kept the block size as 512 threads. Grid size was computed as ⌈(N/512)⌉, whereN is the
number of agents in simulation. We could do limited comparisons against other GPU based
solutions because the code for the systems presented in [22], [25], [23] were not available
publicly. Code for DMASF is available [15].
50
1 2 3 4 5 6 710
−5
10−4
10−3
10−2
10−1
100
101
102
K, number of agents = 10 K
Dec
isio
n−la
g in
sec
onds
Decision−lag obtained for different scenarios on FSAM
Circle−Simulation
Hand−shake Simulation
Sand−pile Simulation
Figure 4.3Decision-lag obtained for different scenarios on GPU-based FSAM
4.4.1 Experiments on performance
We analyzed the performance and expressiveness of the proposed agent execution frame-
work following the 4-agent-state model. The time taken by eachcycle(we call itcycle-time),
idle-timeanddecision-laghas been measured for different scenarios. These times can be
estimated as a function of the number of messages being sent to and received from different
agents and the number of computations in agent code. We took three sample scenarios for our
experiments and compared the time taken byFSAM-framework with CPU-based DMASF. In-
tel I3 Processor with 2.93 GHz and 4 GB RAM was used for CPU. Two CPUs were used for
DMASF. Thecycle-time, idle-timeanddecision-lagare measured in seconds and are aver-
aged over 100cyclesfor both the frameworks. The lowcycle-timesand idle-timesobtained
for different scenarios showed the effectiveness of usingFSAM-framework for a wide range
of simulations.
51
1 2 3 4 5 6 710
−4
10−2
100
102
104
K, number of agents = 10 K
Cyc
le ti
me
in s
econ
ds
Cycle Time comparison for Circle Simulation
FSAMDMASF
Figure 4.4 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:Circle-simulation
1 2 3 4 5 6 710
−6
10−4
10−2
100
102
104
K, number of agents = 10 K
Tim
e ta
ken
in s
econ
ds
Comparison of idle time for Circle Simulation
FSAMDMASF
Figure 4.5 Idle-time comparison between GPU-based FSAM and CPU-based DMASF:Circle-simulation
52
1 2 3 4 5 6 710
−4
10−2
100
102
104
106
K, number of agents = 10 K
Cyc
le ti
me
in s
econ
ds
Comparison of cycle−time for Hand−shake Simulation
FSAMDMASF
Figure 4.6 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:Hand-shake simulation
1 2 3 4 5 6 710
−5
100
105
K, number of agents = 10 K
Idle
−tim
e pe
r ite
ratio
n in
sec
onds
Comparison of idle−time for Hand−shake simulation
FSAMDMASF
Figure 4.7 Idle-time comparison between GPU-based FSAM and CPU-based DMASF:Hand-shake simulation
53
1 2 3 4 5 6 710
−4
10−2
100
102
104
106
K, number of agents = 10 K
Cyc
le−t
ime
in S
econ
ds
Comparison of cycle−time for Sand−pile Simulation
FSAMDMASF
Figure 4.8 Cycle-time comparison between GPU-based FSAM and CPU-based DMASF:Sand-pile simulation
1 2 3 4 5 6 710
−6
10−4
10−2
100
102
104
106
K, number of agents = 10 K
Idle
−tim
e pe
r ite
ratio
n in
Sec
onds
Comparison of idle−time for Sand−pile Simulation
FSAMDMASF
Figure 4.9Idle-time comparison between GPU-based FSAM and CPU-based DMASF: Sand-pile simulation
54
4.4.1.1 Computationally intense with no messaging (Circlesimulation).
We randomly distributed agents on a 2D plane. Goal of the agents is to position themselves
in a circle and then move around it. The strategy involved computation of arithmetic mean
(which requiresN additions corresponding to both the dimensions, whereN is the number
of agents) for each agent in everycycle. Figures 4.4, 4.5 show the comparison for execution
time between GPU-based FSAM and CPU-based DMASF. Speed-up achieved using FSAM
is nearly200 times forcycle-time. Idle-time for CPUs increases linearly with the number
of agents. For GPUs theidle-timecomputed depends on the number of cores available for
concurrent computation. It remains same till the number of agents are less than30,720(as all
the agents are scheduled concurrently) and gradually increases after that.
4.4.1.2 Communication intensive (Hand-shake Simulation).
In this simulation, all the agents were packed in a single room. Their aim was to shake
hands with every other agent 100 times.Hand-shakebetween two agents X and Y is modelled
as a message from X to Y followed by an acknowledgement message sent by Y to X. Every
agent ended up sendingN messages (one corresponding to each agent) in everycycle, where
N is the number of agents. Figures 4.6, 4.7 show the comparisonresults. The GPU-based
FSAM has a memory based messaging model and hence has a remarkably low cycle-timeas
compared to CPU-based DMASF which has disk-based messaging model. Speed-up achieved
in cycle-timefor 107 agents was more than10,000times.
4.4.1.3 Messaging and Computationally balanced (Sand-pileSimulation).
In this problem, grain particles (each particle modelled asan agent) are dropped from a
height into a beaker through a funnel, and they finally settledown in the beaker after colliding
with each other and with the walls of beaker and funnel. A detailed description and solution
of the problem is given in [26] and for visualization of the problem refer to [27]. For each
agent, a singlecycleinvolved exchange of positions and velocities with a subsetof agents and
55
then computing physical forces acting on it and sending its impact to the appropriate agents.
Comparison results are shown in 4.8, 4.9. A lot of mathematical calculations are required in
this simulation and GPUs being faster than CPUs leads to the speed-up of as much as2316
times for107 agents.
4.4.2 Evaluating FSAM-framework Architecture
We tested our design decisions forFSAM-framework architecture presented in Section 4.3.
4.4.2.1 Agent distribution algorithm
Using CUDA Occupancy Calculator ([10]), we computed that30,720concurrent threads
can run on a single GPU for Sand-pile Simulation ([26]) code.We ran the simulation with
30,000 agents. We first divided them equally on four (identical) GPUs and found thecycle-
time to be10.837ms. Next, we ran all the agents on a single GPU and found thecycle-time
equal to10.841ms, same as before. Thus, we can use spare GPUs to run another instance of
FSAMand perform different simulations in parallel.
Then, we increased the number of agents to 60,000. We ran themon a single GPU and
found thecycle-timeas13.643ms. Using our distribution algorithm, we get the distribution
on two GPUs as 30,720 and 29,280. With this distributioncycle-timeachieved was10.838ms.
4.4.2.2 Warp management
We divided10,000,000agents into two agent-types, assand-particle agentor beaker agent
corresponding to thesand-pile simulationin the ratio 100:1. We compared a random distri-
bution of agents on CUDAblocksagainst the optimization stated in Section 4.3.4 - same
blockhaving same type of agents. While for the random distributionthe averagecycle-time
was680.413 ms; the corresponding time for the optimization in Section 4.3.4 was80.491 ms
showing more than8-fold improvement. This is because agents of typebeakerinteract with
significantly more number of particles in their updation than thesand-particleagent. Ran-
56
dom distribution of agents on the blocks, hence, caused theparticle agent to remain idle for
a longer time.
4.4.2.3 Stress Testing
By architecture, in CUDA ablock can have a maximum of1024 threads and a grid can
have maximum of65,535× 65,535(approximately232) blocks, giving a total of242 possible
threads. We carried out experiments to find out the capacity of FSAM-framework with 4
GPU-basedagent-controllers. Each agent was allotted 10 bytes of data.Mapped memoryon
CPU had 4GB of space. Hence, the maximum number of possible threads (or agents) were
limited to 3.676×108. Idle-timeobtained was0.00153 milli-seconds. Next, we used all the
space in GPUglobal memory(refer [11]) and CPU host memory for the agents data. The
number of agents were increased to7.250×108. Idle-timeobtained in this case was almost
same,0.00159 milli-seconds. In either case, on increasing the number of agents, CUDA
kernel(refer [11]) failed to launch. Further, we ran the above number of agents till1,000,000
cyclesand neither agents nor anyagent-controllerfailed, testifying the remarkable capability
of CUDA to support a large number of threads. Thus,FSAM-framework built on top of
CUDA is quite reliable.
57
Chapter 5
Conclusions
Cloud computing and multi-core based concurrent programming are recent advancements
in the field of solving larger problems. Multi-agent simulations when scaled up to several mil-
lions of agents is one such problem posing several challenges - (i) scalability of the framework
running such massive simulations; a lot of computation power is needed to execute agents
along with primary and secondary storage to store simulation data. (ii) fast agent execution
and message delivery along with latest agent-environment state perception. In our work, we
addressed these challenges and provided separate solutions for them. The challenge with scal-
ability is handled with Hadoop-cum-Lucene solution and agent-state perception challenge is
tackled by utilizing CUDA over GPUs.
Hadoop provides a novel framework for running applicationsinvolving thousands of nodes
and petabytes of data. It allows a developer to focus on agentmodel and their optimization
without getting involved in fault tolerance issues. Extensibility of hardware on which frame-
work is running is made easy by Hadoop, by allowing dynamic addition of new nodes and
by allowing heterogeneity between operating systems whichthe different nodes are running.
Therefore, it provides a strong backbone for implementing large scale agent-based simulation
framework. Using cached results is a major optimization in the framework. A faster lookup
for agents is achieved by indexing agent-data using Lucene.Agent-messages in an iteration
are also indexed using Lucene to achieve fast agent-messaging. Further, the simulation data
58
for each iteration is stored in HDFS, which can be used for off-line/on-line visualization of
the sim- ulation.
GPUs provide a massively concurrent architecture for program execution. CUDA al-
lows us to utilize the multi-core GPUs through creation and execution of several millions
of threads. Agent-based simulation frameworks developed on CUDA can have each agent
owning its own GPU-thread of execution, thus, enabling agents to be active almost all the
time during the simulation and get latest perceptions from enviornment and other agents. We
developed an agent-based simulation framework, FSAM-framework, which followed a four
state agent execution model and speeded-up the agent-execution and agent-messaging using
computational power of GPUs. The presented four-state agent execution model along with
GPUs tackled the problem of latest environment perceptionswith decision-lagof perceive
state making the difference from currently followed agent execution models.
Selecting an appropriate alternative out of the presented frameworks depends on the ap-
plication at hand. If the application has a lot of memory requirement, in the order of several
giga-bytes, then framework developed on Hadoop is the better alternative to be used. GPUs
have a limited memory and as such would fail to launch the agent-execution if the data does
not fit into the memory. On the other hand, if the rate of perceptions being received is more
than the rate at which they are processed by an agent, then thefour state agent execution
model should be used. In general, the four state agent execution model fits best on the par-
allel GPUs giving very low values fordecision-lag, provided the simulated agents do not
require a lot of memory during execution. However, the four state model can be executed on
other hardware architectures as well, only the delay valuesfor decision-laggets higher.
5.1 Future Work
The impelemented systems can be further optimized and more functionalities can be added
in them. Developing better heuristics for caching results and to determine appropriate cache
sites for faster access of the results are some of the challenging tasks in Hadoop framework.
59
For online analysis, scene visualization module can be provided which can be triggered at the
end of a single map-reduce job (which is equivalent to a single iteration in the multi-agent
simulation) and the updated scene can then be rendered usingJava APIs.
The 4-state agent execution model in its presented form is fitfor only 1-level of depen-
dencies among agent variables (a dependency is said to be of leveln if there exists a path of
lengthn in the directed dependency graph). The presented executionmodel can be extended
to solven-level of dependencies, by allowing agents to make updates of dependent variables
in their perceive state and multiplying the decision-lag bya factor ofn.
The frameworks developed are available for download at [34], [33]. Sample agent codes
can be found in the Appendix.
60
Related Publications
1. A Multi-agent Simulation Framework on Small Hadoop Cluster - Prashant Sethia and
Kamalakar Karlapalem -Engineering Applications of Artificial Intelligence Journal.
DOI: 10.1016/j.engappai.2011.06.009
2. A Multi-agent Simulation Framework on Small Hadoop Clouds- Kamalakar Karla-
palem and Prashant Sethia -ITMAS workshop at Ninth International Conference On
Autonomous Agents And Multi-agent Systems, Toronto, Canada - 2010.
3. Efficient Multi-Agent Simulation using Four State Agent Execution Model on GPUs
- Prashant Sethia and Kamalakar Karlapalem. Under Review in Engineering Applica-
tions of Artificial Intelligence Journal.
61
Bibliography
[1] Jacques Ferber - Multi-Agent System: An Introduction toDistributed Artificial Intelli-
gence. Addison Wesley Longman, Harlow, UK, 1999.
[2] Jeffrey Dean and Sanjay Ghemawat - MapReduce: Simplified Data Processing on Large
Clusters. Proceedings of the 6th conference on Symposium on OSDI - Volume 6, 2004.
[3] Steven F. Railsback, Steven L. Lytinen, Stephen K. Jackson - Agent-based Simulation
Platforms: Review and Development Recommendations - Societyfor Computer Simu-
lation International, 2006.
[4] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,Deborah A. Wallach,
Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber - Bigtable: A
Distributed Storage System for Structured Data. Seventh Symposium on OSDI, Seattle,
WA, 2006.
[5] Hadoop wiki page : [http://wiki.apache.org/hadoop]
[6] Nuannuan Zong, Feng Gui, Malek Adjouadi - A New ClusteringAlgorithm of Large
Datasets with O(N) Computational Complexity : Proceedings ofthe 5th International
Conference on ISDA, 2005.
[7] Cloud-computing Wikipedia-page: [http://en.wikipedia.org/wiki/Cloudcomputing]
[8] http://code.google.com/edu/parallel/mapreduce-tutorial.html
62
[9] nVidia CUDA home-page :
http://www.nvidia.com/object/cudahome.html
[10] CUDA Occupancy Calculator :
http://developer.download.nvidia.com/compute/cuda/CUDA Occupancycalculator.xls
[11] nVidia CDUA Programming Guide Version 3.0
[12] S TISUE, U Wilensky. NetLogo: Design and Implementation of a Multi-Agent Model-
ing Environment. Proceedings of the Agent Conference, 2004.pp161-184.
[13] G Yamamoto, H Tai, H Mizuta. A Platform for Massive Agent-Based Simulation and
Its Evaluation. MMAS, 2008. pp1-12.
[14] IVA Rao, M Jain, K Karlapalem. Towards Simulating Billions of Agents in Thousands
of Seconds. AAMAS, 2007. Article No. 143.
[15] DMASF code: http://sourceforge.net/projects/dmasf/files/
[16] S Luke, C Cioffi-Revilla, L Panait, K Sullivan, G Balan. MASON: a multiagent simula-
tion environment. Simulation, 2005. pp517-527.
[17] N Minar, R Burkhart, C Langton, M Askenazi. The Swarm Simulation System: A
Toolkit for Building Multi-agent Simulations. Santa Fe Institute Working Paper, 1996.
[18] N Collier. Repast: An extensible framework for agent simulation. Natural Resources
and Environmental Issues. Vol. 8, Article 4.
[19] M Sonnessa. JAS: Java Agent-based Simulation Library.An Open Framework for
Algorithm-Intensive Simulation. Industry And Labor Dynamics: The Agent-Based
Computational Economics Approach, 2003. pp43-56.
[20] A Fedoruk, R Deters. Improving fault-tolerance by replicating agents. AAMAS, 2002.
pp737-744.
63
[21] R Sarika, S Harith, K Karlapalem. Database Driven RoboCupRescue Server. RoboCup
2008. pp602-613.
[22] P Richmond, D Romano. Agent Based GPU, a Real-time 3D Simulation and Interactive
Visualisation Framework for Massive Agent Based Modelling on the GPU. IWSV, 2008.
[23] M Lysenko, RM DSouza. A Framework for Megascale Agent Based Model Simulations
on Graphics Processing Units. Journal of Artificial Societies and Social Simulation,
2008. Vol. 11, pp10.
[24] P Varakantham, S Gangwani, K Karlapalem. On Handling Component and Transaction
Failures in Multi Agent Systems. SIGecom Exchanges, 2001. Vol. 3, pp32-43.
[25] BG Aaby, KS Perumalla, SK Seal. Efficient Simulation of Agent-Based Models on
Multi-GPU and Multi-Core Clusters. International Conferenceon Simulation Tools and
Techniques for Commuications, Networks and Systems and Workshops, 2010. Article
29.
[26] L Breton, JD Zucker, E Clement. A multi-agent based simulation of sand piles in a static
equilibrium. MABS, 2000. pp108-118.
[27] Visualization of sand piles problem: [http://grmat.imi.pcz.pl]
[28] John H. Miller, Scott E. Page - The Standing Ovation Problem : Computational model-
ing in the social sciences, 2004.
[29] L Welch, S Ekwaro-Osire. Fairness in Agent Based Simulation Frameworks. Journal of
Computing and Information Science in Engineering, March 2010. Volume 10, Issue 1,
011002.
[30] M Kiran, P Richmond, M Holcombe, L S Chin, D Worth, C Greenough. FLAME: sim-
ulating large populations of agents on parallel hardware architectures. AAMAS, 2010.
pp1633-1636.
64
[31] Lucene: http://lucene.apache.org/
[32] Solr: http://lucene.apache.org/solr/
[33] Multi-agent simulation framework using CUDA: http://sourceforge.net/projects/fsam/
[34] Multi-agent simulation framework on Hadoop: http://sourceforge.net/projects/hmasf/
65
Appendix A
Example codes for agent simulation on Hadoop simulation
framework
A.1 Circle Simulation
public class CreateEnviron {
public map<object,object> getIterationParameters(){
Map <Object, Object> iter_data =
new HashMap <Object, Object>();
//Name of the directory containing
//intermediate simulation data.
String keytemp="AGENT_DIRECTORY";
String valtemp="CIRCLE_AGENTS";
iter_data.put((Object)keytemp, (Object)valtemp);
//Number of iterations for the simulation to run.
keytemp="NOI";
valtemp=10000;
iter_data.put((Object)keytemp, (Object)valtemp);
66
return iter_data;
}
void createWorld() {
AgentAPI crobj=new AgentAPI();
Random randomGenerator = new Random();
for(int i=1;i<=NUM_AGENTS;i++)
{
//Agent data initialized.
Map <Object, Object> agent_data =
new HashMap <Object, Object>();
agent_data.put((Object)"ID",(Object)i);
agent_data.put((Object)"TYPE",(Object)"circle");
//Circles are distributed randomly between
//co-ordinates (0, 0) and (500, 500).
float x=randomGenerator.nextFloat()*500;
float y=randomGenerator.nextFloat()*500;
agent_data.put((Object)"X",(Object)x);
agent_data.put((Object)"Y",(Object)y);
crobj.createAgent(agent_data);
}
}
}
67
public class AgentUserCode {
map<object, object> Update(map<object, object> agent_data) {
switch (agent_data.get("TYPE")):
case "circle":
float cntx=0, cnty=0;
AgentAPI aapi=new AgentAPI();
for(int i=1;i<=NUM_AGENTS;i++)
{
Map<Object, Object> agent= aapi.getAgent(i);
float tx=(float)agent.get("X");
float ty=(float)agent.get("Y");
cntx+=tx;
cnty+=ty;
}
cntx=cntx/NUM_AGENTS;
cnty=cnty/NUM_AGENTS;
float rad=1.5;
float dy=cnty-(float)agent_data.get("Y");
float dx=cntx-(float)agent_data.get("X");
float theta =atan2(dy,dx);
float targetx = rad * cos(theta) + cntx;
float targety = rad * sin(theta) + cnty;
float x = (targetx - (float)agent_data.get("X"));
float y = (targety - (float)agent_data.get("Y"));
agent_data.put((Object)"X",(Object)x);
agent_data.put((Object)"Y",(Object)y);
68
default:
break;
return agent_data;
}
void Shape(String agent_type) {
switch (agent_type):
case "circle":
//User code for rendering shape.
default:
break;
}
}
A.2 Standing Ovation Simulation
public class CreateEnviron {
public map<object,object> getIterationParameters(){
Map <Object, Object> iter_data =
new HashMap <Object, Object>();
//Name of the directory containing
//intermediate simulation data.
String keytemp="AGENT_DIRECTORY";
String valtemp="AGENTS_STANDING_OVATION";
iter_data.put((Object)keytemp, (Object)valtemp);
69
//Number of iterations for the simulation to run.
keytemp="NOI";
valtemp=1000;
iter_data.put((Object)keytemp, (Object)valtemp);
return iter_data;
}
void createWorld() {
AgentAPI crobj=new AgentAPI();
Random randomGenerator = new Random();
for(int i=1;i<=NUM_AGENTS_X;i++)
{
for(int j=1;j<=NUM_AGENTS_Y;j++)
{
//Agent data initialized.
Map <Object, Object> agent_data =
new HashMap <Object, Object>();
String id=Integer.toString(i)"+":"+Integer.toString(j);
agent_data.put((Object)"ID",(Object)(id));
agent_data.put((Object)"TYPE",(Object)"audience");
//Agents are seated in a rectangular grid.
agent_data.put((Object)"X",(Object)i);
agent_data.put((Object)"Y",(Object)j);
//Flag indicating whether the agent is standing.
70
int stand=randomGenerator.nextInt(2);
agent_data.put((Object)"STAND",(Object)stand);
crobj.createAgent(agent_data);
}
}
}
}
public class AgentUserCode {
map<object, object> Update(map<object, object> agent_data) {
switch (agent_data.get("TYPE")):
case "audience":
AgentAPI aapi=new AgentAPI();
int total=0, standing=0;
int x=(int)agent_data.get("X");
int y=(int)agent_data.get("Y");
List<String> Ids = new ArrayList<String>();
Ids.append(Integer.toString(x-1)+":"+Integer.toString(y-1));
Ids.append(Integer.toString(x-1)+":"+Integer.toString(y));
Ids.append(Integer.toString(x-1)+":"+Integer.toString(y+1));
Ids.append(Integer.toString(x)+":"+Integer.toString(y-1));
Ids.append(Integer.toString(x)+":"+Integer.toString(y+1));
Ids.append(Integer.toString(x+1)+":"+Integer.toString(y-1));
Ids.append(Integer.toString(x+1)+":"+Integer.toString(y));
Ids.append(Integer.toString(x+1)+":"+Integer.toString(y+1));
for(int i=0;i<Ids.size();i++)
{
Map<Object, Object> agent= aapi.getAgent(Ids.get(i));
71
if(agent!=NULL)
{
int st=(int)agent.get("STAND");
standing+=st;
total+=1;
}
}
if(standing > total/2)
agent_data.put((Object)"STAND",1);
else if(standing < total/4)
agent_data.put((Object)"STAND",0);
default:
break;
return agent_data;
}
void Shape(String agent_type) {
switch (agent_type):
case "audience":
//User code for rendering shape.
default:
break;
}
}
72
A.3 Sand-pile Simulation
public class CreateEnviron {
public map<object,object> getIterationParameters(){
Map <Object, Object> iter_data =
new HashMap <Object, Object>();
//Name of the directory containing
//intermediate simulation data.
String keytemp="AGENT_DIRECTORY";
String valtemp="SANDPILE_AGENTS";
iter_data.put((Object)keytemp, (Object)valtemp);
//Number of iterations for the simulation to run.
keytemp="NOI";
valtemp=10000;
iter_data.put((Object)keytemp, (Object)valtemp);
return iter_data;
}
void createWorld() {
AgentAPI crobj=new AgentAPI();
Random randomGenerator = new Random();
for(int i=1;i<=NUM_AGENTS;i++)
{
//AgentStruct data initialized.
Map <Object, Object> agent_data =
new HashMap <Object, Object>();
agent_data.put((Object)"ID",(Object)i);
agent_data.put((Object)"TYPE",(Object)"sand");
73
//Sand-particles are distributed randomly
//between co-ordinates (0, 0) and (1000, 1000).
float x=randomGenerator.nextFloat()*1000;
float y=randomGenerator.nextFloat()*1000;
agent_data.put((Object)"X",(Object)x);
agent_data.put((Object)"Y",(Object)y);
agent_data.put((Object)"VX",(Object)vx);
agent_data.put((Object)"VY",(Object)vy);
crobj.createAgent(agent_data);
}
}
}
public class AgentStruct{
int id;
float x,y,vx,vy;
}
public class AgentUserCode {
map<object, object> Update(map<object, object> agent_data) {
switch (agent_data.get("TYPE")):
case "sand":
float cntx=0, cnty=0;
AgentAPI aapi=new AgentAPI();
float e=0.6;
int i;
float x,y,vx,vy,t,s,tn, circ, th, an ,cs, sn;
x=(float)agent_data.get("X");
y=(float)agent_data.get("Y");
vx=(float)agent_data.get("VX");
74
vy=(float)agent_data.get("VY");
for (i=1;i<=NUM_AGENTS;i++)
{
Map<Object, Object> agi=aapi.getAgent(i);
AgentStruct ai= new AgentStruct();
ai.id=(float)agi.get("ID");
ai.x=(float)agi.get("X");
ai.y=(float)agi.get("Y");
ai.vx=(float)agi.get("VX");
ai.vy=(float)agi.get("VY");
if(ai.id!=agent_data.get("ID")
{
if(Math.pow(ai.x-x,2)+
Math.pow(ai.y-y,2)< 6400)
{
AgentStruct k= new AgentStruct();
k.x=(ai.x-x); k.y=(ai.y-y);
AgentStruct v = new AgentStruct();
v.vx=(ai.vx); v.vy=(ai.vy);
float radm=(1.0*((k.x*k.x)+(k.y*k.y)));
if( radm == 0)
radm=1;
float kx=k.x/radm, ky=k.y/radm;
ai.x=x+kx*80; ai.y=y+ky*80;
AgentStruct n= new AgentStruct();
float velm=(1.0*((v.vx*v.vx)+(v.vy*v.vy)));
if (velm == 0)
velm=1;
75
n.vx=v.vx/velm; n.vy=v.vy/velm;
float tx,k1x,k1y;
tx=(kx*n.vx+ky*n.vy)*velm;
kx=tx*kx; ky=tx*ky;
AgentStruct k1=new AgentStruct();
AgentStruct v1=new AgentStruct();
k1.x=-k.x; k1.y=-k.y;
v1.vx=vx; v1.vy=vy;
if (radm == 0)
radm=1;
k1x=k1.x/radm; k1y=k1.y/radm;
velm=(1.0*((v1.vx*v1.vx)+(v1.vy*v1.vy)));
if (velm == 0)
velm=1;
n.vx=v1.vx/velm; n.vy=v1.vy/velm;
if( velm == 0)
velm=1;
tx=(k1x*n.vx+k1y*n.vy)*velm;
k1x=tx*k1x; k1y=tx*k1y;
AgentStruct nochng=new AgentStruct();
AgentStruct nochng1=new AgentStruct();
nochng.x=v.vx-kx; nochng.y=v.vy-ky;
nochng1.x=v1.vx-k1x; nochng1.y=v1.vy-k1y;
vx=(1-e)*k1x/2+(1+e)*kx/2+nochng1.x;
ai.vx=(1-e)*kx/2+(1+e)*k1x/2+nochng.x;
vy=(1-e)*k1y/2+(1+e)*ky/2+nochng1.y;
ai.vy=(1-e)*ky/2+(1+e)*k1y/2+nochng.y;
}
76
}
}
float g=9.8;
t=440*(y)/300.0;
s=100+t;
th=(440/300);
circ=x*x+y*y-(s-80)*(s-80);
if(circ>=0 && x!=0)
{
tn=y/x;
if(tn<0)
tn=tn*(-1);
an=Math.atan(tn); sn=Math.sin(an); cs=Math.cos(an);
if(x>=0 && y<0)
cs=cs*(-1);
if(y>=0 && x>=0)
{ sn=sn*(-1);
cs=cs*(-1);
}
if(x<0 && y>=0)
sn=sn*(-1);
vx+=(g*(th)*cs*(th));
vy-=(g*(th)*(th));
x+=vx*0.1+0.005*(g*(th)*cs*(th));
y+=vy*0.1-0.005*g*(th)*(th);
}
else
77
{
vx=0; vy-=(g/10);
x+=vx*0.1; y+=vy*0.1-0.005*g;
}
(float)agent_data.put((Object)"X", (Object)x);
(float)agent_data.put((Object)"Y", (Object)y);
(float)agent_data.put((Object)"VX", (Object)vx);
(float)agent_data.put((Object)"VY", (Object)vy);
default:
break;
return agent_data;
}
void Shape(String agent_type) {
switch (agent_type):
case "circle":
//User code for rendering shape.
default:
break;
}
}
78
Appendix B
Example codes for agent simulation on FSAM
B.1 Circle Simulation
int NUM_ITERATIONS=10000;
struct Agent{
int Id;
float x,y,cntx,cnty;
};
void InitializeWorld(Agent * gpuAgents)
{
//Initialize agents.
gpuAgents=(Agent *)malloc(sizeof(Agent)*NUM_GPU_AGENTS);
for(int i=0;i<NUM_GPU_AGENTS;i++)
{
gpuAgents[i].Id=i+1;
//Distribute circles randomly between
//co-ordinates (0,0) and (500, 500).
gpuAgents[i].x=rand()%500;
gpuAgents[i].y=rand()%500;
79
}
}
//To decide whether one agent can send a message to another
__device__ bool Allow(Agent *agnt1, Agent *agnt2)
{
//In this example we make it tru always.
return true;
}
__device__ void Update(Agent *agnt, Agent *agents)
{ //Updation in this case is based on perception
//hence nothing is done.
;
}
__device__ void Perceive(Agent *agnt, Agent *agents)
{
//Perceive locations of other agents
//and compute centroid.
float cntx=0, cnty=0;
for(int i=0;i<NUM_GPU_AGENTS;i++)
{
cntx+=agents[i].x;
cnty+=agents[i].y;
}
agnt->cntx=cntx/NUM_GPU_AGENTS;
agnt->cnty=cnty/NUM_GPU_AGENTS;
}
80
__device__ void Decide(Agent *agnt, Agent *agents)
{
//Update accordingly based on the centroid perceived.
float rad=1.5;
float cntx=agnt->cntx, cnty=agnt->cnty;
float theta =atan2(cnty-agnt->y,cntx-agnt->x);
float targetx = rad * cos(theta) + cntx;
float targety = rad * sin(theta) + cnty;
agnt->x = (targetx - agnt->x);
agnt->y = (targety - agnt->y);
}
B.2 Hand-shake Simulation
int NUM_ITERATIONS=10;
struct Agent{
int Id;
int countHandShakes, tempCount;
};
void InitializeWorld(Agent * gpuAgents)
{
//Initialize agents.
81
gpuAgents=(Agent *)malloc(sizeof(Agent)*NUM_GPU_AGENTS);
for(int i=0;i<NUM_GPU_AGENTS;i++)
{
gpuAgents[i].Id=i+1;
gpuAgents[i].countHandShakes=0;
}
}
//To decide whether one agent can send a message to another
__device__ bool Allow(Agent *agnt1, Agent *agnt2)
{
//This is just a sample strategy.
if(agnt1->Id > agnt2->Id)
return true;
else
return false;
}
__device__ void Update(Agent *agnt, Agent *agents)
{
//Send dummy messages to other agents.
for(int i=0;i<NUM_GPU_AGENTS;i++)
{
SendMessage(agnt->Id, agents[i]->Id, "HAND-SHAKE!");
}
}
__device__ void Perceive(Agent *agnt, Agent *agents)
82
{
char *[] msgs;
int count =GetMessages(msgs);
agnt->tempCount=0;
for(int i=0;i<count;i++)
{
//Need to implement your own string comparison.
if(strcmpCUDA(msgs[i],"HAND-SHAKE!")==0)
agnt->tempCount++;
}
}
__device__ void Decide(Agent *agnt, Agent *agents)
{
//Update based on number of hand-shakes received.
agnt->countHandShakes+=agnt->tempCount;
}
B.3 Sand-pile Simulation
int NUM_ITERATIONS=10000;
struct Agent{
int Id;
float x, y, vx, vy;
};
void InitializeWorld(Agent * gpuAgents)
{ //Initialize agents.
83
gpuAgents=(Agent *)malloc(sizeof(Agent)*NUM_GPU_AGENTS);
for(int i=0;i<NUM_GPU_AGENTS;i++)
{ //Sand-particles are distributed randomly
//between co-ordinates (0, 0) and (1000, 1000).
gpuAgents[i].Id=i+1;
gpuAgents[i].x=rand()%1000;
gpuAgents[i].y=rand()%1000;
gpuAgents[i].vx=0;
gpuAgents[i].vy=0;
}
}
//To decide whether one agent can send a message to another
__device__ bool Allow(Agent *agnt1, Agent *agnt2)
{
return true;
}
__device__ void Update(Agent *agnt, Agent *agents)
{
//Update of the position depends on
//relative positions of sand-particles.
//Hence, computed after perceptions.
;
}
__device__ void Perceive(Agent *agnt, Agent *agents)
{
//Compute collisions.
float e=0.6;
84
int i;
for (i=0;i<NUM_GPU_AGENTS;i++)
{
Agent ai = agents[i];
if(ai.Id!=agent->Id)
{
if(pow((ai.x-agent->x),2)+
pow((ai.y-agent->y),2)< 6400)
{
Agent k;
k.x=(ai.x-agent->x); k.y=(ai.y-agent->y);
Agent v;
v.vx=(ai.vx); v.vy=(ai.vy);
float radm=(1.0*((k.x*k.x)+(k.y*k.y)));
if( radm == 0)
radm=1;
float kx=k.x/radm, ky=k.y/radm;
ai.x=agent->x+kx*80; ai.y=agent->y+ky*80;
Agent n;
float velm=(1.0*((v.vx*v.vx)+(v.vy*v.vy)));
if (velm == 0)
velm=1;
n.vx=v.vx/velm; n.vy=v.vy/velm;
float tx,k1x,k1y;
tx=(kx*n.vx+ky*n.vy)*velm;
kx=tx*kx; ky=tx*ky;
Agent k1, v1;
k1.x=-k.x; k1.y=-k.y;
85
v1.vx=agent->vx; v1.vy=agent->vy;
if (radm == 0)
radm=1;
k1x=k1.x/radm; k1y=k1.y/radm;
velm=(1.0*((v1.vx*v1.vx)+(v1.vy*v1.vy)));
if (velm == 0)
velm=1;
n.vx=v1.vx/velm; n.vy=v1.vy/velm;
if( velm == 0)
velm=1;
tx=(k1x*n.vx+k1y*n.vy)*velm;
k1x=tx*k1x; k1y=tx*k1y;
Agent nochng,nochng1;
nochng.x=v.vx-kx; nochng.y=v.vy-ky;
nochng1.x=v1.vx-k1x; nochng1.y=v1.vy-k1y;
agent->vx=(1-e)*k1x/2+(1+e)*kx/2+nochng1.x;
ai.vx=(1-e)*kx/2+(1+e)*k1x/2+nochng.x;
agent->vy=(1-e)*k1y/2+(1+e)*ky/2+nochng1.y;
ai.vy=(1-e)*ky/2+(1+e)*k1y/2+nochng.y;
}
}
}
float g=9.8;
float x,y,t,s,tn, circ, th, an ,cs, sn;
x=agent->x-640; y=agent->y-650;
t=440*(y)/300.0; s=100+t; th=(440/300);
circ=x*x+y*y-(s-80)*(s-80);
if(circ>=0 && x!=0)
86
{
tn=y/x;
if(tn<0)
tn=tn*(-1);
an=atan(tn); sn=sin(an); cs=cos(an);
if(x>=0 && y<0)
cs=cs*(-1);
if(y>=0 && x>=0)
{ sn=sn*(-1); cs=cs*(-1);
}
if(x<0 && y>=0)
sn=sn*(-1);
agent->vx+=(g*(th)*cs*(th));
agent->vy-=(g*(th)*(th));
}
else
{
agent->vx=0; agent->vy-=(g/10);
}
}
__device__ void Decide(Agent *agent, Agent *agents)
{
//Position updated after perceptions for all
//agents are completed.
agent->x=agent->vx*0.1+agent->x;
agent->y=agent->vy*0.1-0.049*g+agent->y;
}
87