part a - technion – israel institute of...
TRANSCRIPT
Technion, Israel Institute of Technology.
Software LaboratoryElectrical Engineering Department
Technion, Haifa, Israel
Bercovici Sivan
Instructor:Frishman Yaniv
Spring 2004
1
Distributed Electronic Mailing
System
PREFACE 1-4
1 INTRODUCTION 1-7
1.1 HISTORY 1-81.2 FARGO 1-101.3 PROJECT OBJECTIVE 1-111.3.1 CURRENT STATE 1-121.3.2 PROBLEM DEFINITION 1-131.3.3 SOLUTION OVERVIEW 1-151.4 COMPARISON 1-17
2 DESIGN AND IMPLEMENTATION DETAILS 2-20
2.1 TECHNOLOGY REVIEW 2-212.1.1 JAVA 2-212.1.2 ECLIPSE 2-212.1.3 SWING 2-222.1.4 FARGO 2-232.1.5 LOG4J 2-232.1.6 ANT 2-242.1.7 JAVADOC 2-242.1.8 JYTHON 2-252.2 DESIGN OVERVIEW 2-262.2.1 FAULT TOLERANCE 2-292.2.2 REDUCING INFORMATION REDUNDANCY 2-312.2.3 LOAD BALANCING 2-322.2.4 FIRST CONNECTION PROBLEM 2-332.3 COMPONENTS DESCRIPTION 2-342.3.1 MAIL 2-342.3.2 ADDRESSBOOK 2-352.3.3 MAILBOX 2-362.3.4 MAILBOXGUI 2-392.3.5 MAILBOXPOOL 2-402.3.6 DISPATCHUNIT 2-422.3.7 DISPATCHUNITGUI 2-472.3.8 DISPATCHUNITSYNCTHREAD 2-472.4 APPLICATION LAYOUT - COMPLET DESIGN 2-502.5 INTER-COMPONENT COMMUNICATION 2-522.5.1 SENDING A MAIL 2-522.5.2 CASTING MAILBOXES 2-532.5.3 SYNCHRONIZING DISPATCH UNITS 2-55
3 TESTING ENVIRONMENT 3-56
2
3.1 OVERVIEW 3-573.2 COMPONENT DESCRIPTION 3-603.2.1 TESTER 3-603.2.2 SPY 3-61
4 FUTURE DIRECTIONS 4-65
APPENDIX A: USER MANUAL 4-67
START LOGGING SERVICES 4-68DISPATCH UNIT 4-69MAILBOX CLIENT 4-71LOGIN 4-71MAIL REVIEW AND MANIPULATION 4-72COMPOSING A MAIL 4-73TESTING ENVIRONMENT 4-75
APPENDIX B: APPLICATION REQUIREMENTS 4-77
References 4-79
3
Preface
It is interesting to examine the evolution of software architecture concepts in
comparison to the evolution of mankind.
At the dawn of mankind, man relied mostly on his own work. The way man lived was
based mostly on his abilities as an individual. This resulted in low productivity and
low survival rate. One way to increase productivity was the crafting of designated
working tools for the different purposes. Nevertheless, the improvement gained from
using these tools was limited. This was due to fact work was still performed by each
man separately.
Experience with coordinated group work proved to be essential. It had a major impact
on both the productivity of the group as well as on the overall group survival rate. A
good example can be found in hunting, where joint effort lead to better results - a
greater catch. Dividing the work between group members based on the unique
member’s abilities was at the essence of man’s way of life. The male, been physically
stronger, dealt with hunting while the female nurtured the children.
Leaping thousands of years forward, knowledge became one of man’s most important
tools. From that end, sharing the knowledge globally allowed faster progress in all
fields of research. The curious mankind thrives on its new discoveries, as their rate
constantly increases.
At first glance, examining mankind as a fault-tolerance system might seem strange.
Still, one can easily mark key characteristics of a fault-tolerant system as they reveal
themselves in this biological system. Ranging from the survival of man to the survival
of man’s knowledge, “fault-tolerance by replication” [21, 22] plays an important role
in the confrontation with fatal disasters. The system’s inherent self-regeneration
characteristic allows it to converge to a steady-state after a dramatic “fault”.
Reviewing software’s shorter history reveals similar milestones. The first applications
(as with many applications today) performed most of the work on their own, utilizing
4
only the resources available on the computers on which they were ran. The use of
highly optimized libraries for common services increased the performance of
individual applications. The new applications incorporated the use of these libraries in
their own code.
In parallel to the development of better single-computer applications, a new
generation of distributed applications arose. The introduction of distributed systems
allowed an application to divide the work between numerous computers that
constructed a working network. In many cases, this coordinated work allowed a major
increase in performance. A network of computers used to perform such a distributed
task can be built up from various computers having different capabilities. The
productivity increase gained by optimizing the network to utilize each computer's
unique capabilities is significant, as demonstrated in some distributed working
environments, such as Condor [1]. Harnessing these computers in an optimal way
according to their capabilities proved to be rewarding.
One of the most widely used techniques against system fatal fault (a computer crash)
is the “fault-tolerance by replication” technique. What is inherently available in man
(and mankind in general) is somewhat imitated in the distributed software:
Replications of the software’s components are used as backup. When a certain
software component fails, it is replaced by its replica – an identical component that is
still active - allowing the entire system to continue with its normal work despite of
local faults.
The increased popularity of distributed applications pushed researchers to examine the
field of fault-tolerance in distributed systems, exploiting the new opportunities as they
arose. An example of such a technique is the self-regenerating system. If parts of the
distributed system fails, the software component which ran on the fault computers are
revived on working computers, allowing the distributed system to overcome the fault.
Decentralization is another characteristic that can be viewed in some human work
groups, as well as in several computer applications. Many companies, that used to
employ a strict organizational hierarchy, found out that eliminating it promotes direct
communication between various groups in the company, leading to increased
5
efficiency. Peer-To-Peer technology does the same for computer applications: by
using direct communication between working nodes (computers on the network), it
reduces the load on the central server - the most common bottleneck in such systems –
thus leading to better performance.
We can summarize this analogy with the conclusion that there are similar
characteristics in the evolution of man and the evolution of software, and that we can
expect even greater common characteristics between these two in the future.
Redesigning popular applications and computer services to a more "distributed" form
is rather challenging and has occupied the minds of many researchers and computer
engineers. The decomposition and re-composition of an application allowing such
work is difficult and most times not explicitly apparent.
In this project we will examine such a popular service – the e-mail system. Although
inherently distributed, making such a system decentralized is not trivial. The focus of
the work will be facing hard decentralization issues as well as fault-tolerance issues.
In the following chapters we will review a short history of network applications, the
state of e-mail systems today and a suggested novel distributed e-mail system. We
will examine the needed technology, review the new system’s design and examine
core concepts dealing with the fault-tolerance aspects of such systems.
6
1 Introduction
7
1.1 History
Since the beginning of the use of the Internet (and other networks at general),
there has been one dominant distributed software architecture, commonly known as
the Client/Server Software Architecture [2]. According to this paradigm, the
distributed software is separated into two communicating components. The roles of
each component were strictly defined, such that the client component requests a
service while the server component provides the service.
Peer-to-peer technology [3,4,5] (abbreviated P2P) suggested a different application
network layout. In this type of network each component has equivalent capabilities
and responsibilities. There exists no set of computers dedicated to serving the others,
but rather each component has the ability to both offer and request a service. A P2P
application will use a proprietary protocol in order to enable communication between
the network’s different components. The design and implementation of a P2P
application will take into account the fact each component that constructs the P2P
network is complete regardless of the other components (i.e. each client is self-
sufficient in the sense of application integrity). A P2P application is inherently much
more flexible, scalable and fault-tolerant than a classic network application, mostly
due to the fact the decentralization is at the core of its design.
Adding a user to a P2P system does not harm performance as happens in the more
centralized approach.
File-sharing applications based on P2P technology [6,7] offer some degree of fault-
tolerance in the sense of the availability of the shared files. As long as a certain
amount of users poses a certain file, other users are able to retrieve it (replication of
the resource).
The introduction of mobile-components distributed applications [8,9] allowed an even
more flexible and dynamic layout of the application’s components. In this scheme, all
of the components are part of a single application. These components can relocate to
computers that are connected to the network and continue their execution on the
8
remote computers. Communication between the different application’s components is
straightforward as if the components are all on a single computer. Relocation of
components on the live network enabled the exploitation of performance optimization
opportunities, as they arise in run-time. Fault-tolerance issues can be achieved by
replications of application’s components, enabling the application to revive to a legal
state using copies that lie on distant computers.
9
1.2 Fargo
FarGo [10, 11] is an example of a mobile-component-based distributed
applications development environment. FarGo allows the implementation of a
dynamic and adaptive application capable of working on large networks, over links
with varying capacities and computer abilities. Through the supplied middleware, one
is able to design an efficient and reliable distributed application capable of adapting to
the constantly changing network environment. At the core of FarGo lays the concept
of “dynamic application layout”, permitting the modification of a component’s
location at runtime.
FarGo is java based, which yields easy development of a cross-platform application.
Its unique capability to control relationships between moving objects allows a more
precise and rich application design.
Using FarGo does not imply major code changes but rather only a small set of
modifications are required to allow the application’s components to be mobile.
Implementing object transfer management logic using the FarGo infrastructure is
simple and direct.
10
1.3 Project objective
The objective of the project is to provide a novel electronic mailing system
that is decentralized and fault-tolerant. The project will attempt to provide a cross-
platform, lightweight, flexible, scalable and adaptive solution.
Similar work in this field of time-shared storage, as defined in the chord definition
paper [4], can be examined in projects such as Freenet [27], Bayou [16, 17] and
OceanStore [18].
Research during this project will focus on optimizing communication (based on a
decentralized scheme) and providing numerous fault recovery mechanisms.
Optimizations in the direction of data redundancy will also be examined.
In this research we examine the use of redundancy in a mobile-component distributed
application. We will also examine different optimization opportunities that are
available on a mobile-object design, such as use-rate based adaptive object movement
and communication bottleneck removal using direct communication between the
components.
As mentioned earlier, redundancy is widely used in order to provide fault-tolerance by
replications. A faulty component is backed-up by one of its replications. This suggests
that an increase in replication yields a more fault-tolerant application. From that end,
we will examine the use of replications as a fault-tolerance mechanism. We will take
a step toward the examination of a self-regenerating system. When a fault occurs in
such systems, the application will try to regenerate dead components, reviving them
on new computers.
There are cases in which the redundancy is not required but rather is inherent in an
application design. We will analyze an application example in an attempt to find a
11
redundancy optimization opportunity and try to solve it using a mobile component
approach.
We will supply a centralized testing environment that will allow run-time examination
of the system, as it is divided among different computers. A Jython [12] based
scripting interface will allow a simple and flexible testing environment for system.
1.3.1 Current state
The e-mail system actually pre-dates the Internet. As a matter of fact, e-mail
systems were a crucial tool in creating the Internet. Back in 1965, E-Mail started as a
way for multiple users of time-shared mainframe computers to communicate. What
was once that simple e-mail system evolved to a network email system. Users could
pass messages between different computers. The ARPANET computer network made
the e-mail application significantly more popular as it became one of its “killer apps”.
Due to the lack of direct inter-network connection of computers, an address-passing
list (“route”) between the computer of the sender and the computer of the receiver had
to be supplied. The e-mail has obtained the ability to pass between a number of
networks (such as ARPANET, BITNET and NSFNET).
In modern internet e-mail system, the e-mail is delivered directly to internet-
connected hosts. In most cases this is achieved using Domain Name System (DNS)
services and the simple mail transfer protocol (abbreviated SMTP).
The format of the modern internet e-mail message, as define in RFC 2822, consists of
two components: The header, component containing address information (sender and
receiver) and other information regarding the e-mail (such as subject and date), and
the body component containing the message itself.
The messages are exchanged between hosts using SMTP with mailing software (such
as Pine, Sendmail, etc.). Users download their personal messages from servers using
either the POP or IMAP protocols.
12
E-mail has been extended by the Multipurpose Internet Mail Extensions (MIME)
standard to allow the encoding of binary attachments to e-mails. Users were than able
to attach files (images, documents, etc.) to the e-mail they sent.
1.3.2 Problem definition
As described in the previous section, most modern e-mail systems are based
on a centralized approach in which users communicate with a central server in order
to retrieve their personal mail.
This sort of approach inherently suffers from key problems such as scalability, fault-
tolerance and various constrains on the system’s users.
In such a solution a bottleneck is evident at the server’s side. An increase in the
system’s number of users will increase the amount of communication that the mailing
server handles. This may cause fatal deterioration of the service availability to a point in
which the server is unable to provide the service at all.
Another inherent problem with the centralized solution is that of a failure at the server
(a crash). Any system that has such a core at its design will suffer from this sort of
problem. There could be scenarios in which due to a partial network disconnection or
delay, online clients can not reach the online server in order to retrieve and/or send
their mail.
In many popular mailing services (such as Hotmail) the user of the e-mail service is
limited to a certain space quota. The business model suggested that in order to get a
larger mailbox one should either pay or be forced to view commercial advertisements.
This model seems to impose unneeded boundaries on the system’s users.
One of the most major causes of individual quota problem and server space problems
are the use of attachments in e-mail. Some e-mail systems create a duplicate of the
13
attachment for each mail recipient. In the classical e-mail system structure, replication
of the attachment when sent to a distant server can not be avoided.
14
1.3.3 Solution overview
Nowadays, most e-mail services are provided using a single (or multiple) mail
servers that are used both as an address and a storage place for a user’s mailbox. This
centralized design might suffer from scalability issues (more users/larger mailbox),
throughput and latency due to the fact both sender and receiver of an e-mail perform
their work through a fixed set of mail servers. Fault-tolerance issues arise from the
fact all e-mails are stored in this fixed set of servers (which are in most cases on the
same geographical site).
A decentralized e-mail system (abbreviated DEM) will be implemented in order to
supply users with a simple, scalable, fault-tolerant mailing system.
Using FarGo as the development environment, we will design a decentralized mailing
system, providing answers to both performance and fault-tolerance issues. Using
personal traveling mailboxes which reside on on-line clients, most communication
will be done between mailboxes, thus removing much of the bottleneck that might be
caused by the mail server. FarGo will allow this using its reference tracing
mechanism. Fault-tolerance of mailboxes will be dealt by traveling backup mailboxes
that will scatter among the currently connected clients. Again, communication
between the mailboxes and their backup mailboxes (for synchronization) can be easily
implemented using the FarGo infrastructure. The server(s) goal in this scheme will be
to act as a reference gate for all the mailboxes. As these servers may experience
crashes, a special reference resolving is done on backup servers to enable fault-
tolerance at this point as well. The system’s components will try to detect local faults,
regenerate its dead components on on-line clients. This adaptive approach, commonly
known as a self-regenerating system, allows the convergence of the system into a
fixed legal state in which the degree of fault-tolerance is preserved.
15
Figure 1: DEM preliminary design overview
An email attachment is an example of unwanted redundancy. In classic mailing
systems, replications of the attachment are made for each recipient of the mail (this
includes both recipients of a long mailing list and recipients due to forwarding of a
mail). Defining the attachment as a mobile object enables a mail to point to the
attachment rather than hold a copy of it. FarGo’s reference transparency allows an
easy implementation of such a feature.
There are a few more advantages that exist in the suggested mail scheme, which are
freely gained. An example of such a gain is the somewhat protection against Spam
mail. In DEM, a client that wishes to send a mail to another client performs a certain
amount of operations per mail. This means that in order to send a large amount of e-
mails to many clients, the malicious sender will have to perform an amount of work
relative to the number of receivers. This can slow Spam process down and even make
it unfeasible. Current research in the field of distributed data mining can be used in
the DEM scheme to provide a more active protection against both Spam mail and
other vicious phenomenon.
Dispatch Units
Client
Client
Client
Client
Off-line
Mailbox
Mailbox
Mailbox
Mailbox
Send mail (b)
Get recipient mailbox pointer (a)
16
1.4 Comparison
Studying core characteristics of both currently available electronic mailing
systems and the suggested DEM system revealed differences that concentrate in three
fields: performance, scalability and fault-tolerance.
Performance can be measured in both time and space. Examining the communication
performance aspect, an inherent bottleneck problem is evident in the centralized
design of current email system, as apposed to the design DEM system. In a classical
email system, when a massive number of users try to send mails through the system, a
single server has to respond to all requests, spending most of its bandwidth to upload
and download mail content (including attachments).
On the other hand, in the DEM system a large portion of communication is based on
decentralization and peer-to-peer design. In this design, mail content is passed in a
peer-to-peer fashion, from a mailbox directly to a distant mailbox. Communication
with the dispatch unit is set to a minimum.
Many email systems do not provide any solution to the unneeded replications of a
mail’s attachment. Either when sent to a long mailing list or forwarded to new
recipients, an attachment gets replicated, wasting both storage space and
communication time. Some more advanced mailing systems offset a local
optimization for that problem. This is achieved by using a database to store a single
copy of an attachment. Any user inside that local system that will receive the mail
with this attachment will actually be given a reference to that single attachment copy
that is available in the data-base. In the DEM system, the design and implementation
of a floating attachment is straightforward. Attachments are considered mobile
objects. A single attachment object is created when a user wishes to add it to a mail.
Any sequential mail that passes that specific attachment will actually pass a reference
(and not a copy). By using this solution we will get a distributed optimization that
works on the entire DEM system.
17
Many centralized system designs suffer from inherent scalability problems. When the
number of service requests grows, the system deteriorates to a point of service
breakdown. Most solution for this problem tend to include enhancement of the center
piece either by using a stronger server or adding a few more computers to help with
the growing number of requests. In DEM, such a dramatic scalability problem is not
evident. This is mostly due to the fact a growing number of clients does not just mean
more request to serve but rather more shared resources of both computation and
storage.
Classic electronic mailing systems are sensitive to server fault. When the server
suffers from either a local fault or a network problem, none of the clients are able to
receive a new mail. The availability of the service is such system is thus easily
affected by very local problems. Modern solutions include several backup servers in
different location to allow continues service in case of some local fault. DEM, being
mostly decentralized, bypasses local faults by providing several dispatch units capable
of performing the needed tasks. We get a similar solution to the one available in
modern mailing system but with a much lower cost. Dispatch units can be
automatically created on any node in the DEM network, providing not only another
fail-safe point but also performance improvement. The performance improvement is
due to the fact requests will be handled by a larger number of dispatch units, reducing
the number of request handled by a single dispatch unit.
A fault in a server is not always due to some accidental problem. Nowadays, the
increasing number of electronic attacks threatens any service provided through the
internet, and the mailing service is not different in that matter. Malicious users of a
service know the exact address of the service providing computer. This starting point
is crucial to most service oriented attacks. As the DEM system lacks a single central
component, an attacker will find it hard to start his attack. Even if the attacker starts
with the backbone dispatch units, the self-regenerating and adaptive characteristics of
the DEM system would allow the other users to continue with their work,
uninterrupted by the fact part of the system is under attack.
Another major problem in the current mailing systems is the spam phenomenon. The
spam is junk mail send mostly to a long mailing list or newsgroup. Different solutions
18
exist to handle this problem, starting from personal mail filters to on-server solutions
that study the content of mails to block unwanted repeating mails. Judging from the
sending party, the advantage of this form of distribution is the low price of processing
time paid by the sending computer. DEM is inherently more protected against spam
due to the fact the sender pays computation time that is almost linear to the number of
mail recipients. This processing time includes the negotiation with a dispatch unit, and
the direct communication with each one of the recipient mailboxes.
19
2 Design and Implementation Details
20
2.1 Technology review
In this section we will review the technology used in the implementation of
the DEM system.
2.1.1 Java
Java [13] is a simple, object-oriented, architecture neutral, portable,
multithreaded programming language. When one wishes to create a portable
application, available on numerous platforms over a network, java is one of the most
obvious choices available today.
The simplicity of Java allows fast development of software, omitting C++ complex
features while adding important features such as the garbage collector.
Another important advantage is the availability of libraries in a wide range of areas
ranging from multimedia libraries to network and file system manipulation facilities.
The small footprint of the java libraries and small code fits with our motivation of
making a lightweight application, encouraging the use of the application.
2.1.2 Eclipse
The Eclipse [23] platform offers an integrated development environment
(IDE) for Java. We used this environment on the early stages of development.
Eclipse was designed as a platform for building IDE that can be used to create
applications ranging from web sites to C++ programs. In this project we use the Java
IDE developed under that platform.
21
The extensive amount of features such as the advanced debugging facilities, code re-
factoring abilities and incremental compilation feature allowed this product to take its
place as one of the leading development tools for the Java language.
Working with its informative errors and warning messages, quick fix-ups and
automatic code completion and generation allowed an even faster development.
As we finished the first phase of implementation and went on toward an extensive use
of FarGo we had to leave Eclipse behind. The reason was that the special tagging
required by the FarGo pre-compiler confused Eclipse’s auto-complete and automatic
error checking. At the more advanced stages of the project, we moved to develop
under xemacs.
2.1.3 Swing
Swing [14] is a Java library which contains a set of extensible GUI
component, enabling developers a more rapid development of powerful Java front
end.
The library is implemented entirely in Java, promoting cross-platform consistency and
easier maintenance. It provides the ability to easily modify the look-and-feel of the
GUI.
The Swing architecture follows the mode-view-controller (MVC) design. According
to the MVC architecture, the application is broken down into three separate parts: The
model that contains the data of the application, the view which visualizes this data and
the controller that intercepts user’s input, translating the actions into operations on the
model.
Swing provides compatibility with AWT APIs on overlapping areas.
Due to these reasons and the ease of Swing use, we choose to implement our GUI
using Swing as much as possible. Small sections of codes are implemented using the
AWT library, and only in cases where no suitable answer could be found on Swing.
22
2.1.4 FarGo
As presented in the introduction section, FarGo is a java-based programming
environment that is used in the development of mobile components distributed
applications.
A review of FarGo’s features revealed that it fits neatly into our suggested solution.
The ability to dynamically adjust the location of objects, preserving certain invariants
was essential in the design and available in FarGo.
The transparency of its working mechanism allowed us to concentrate more on the
development of the algorithmic side of the application rather than dealing with the
mobility technicalities. Also, the ease of converting currently coded classes to that of
a mobile object enabled us to start with a non-distributed solution that is much easier
to debug, and only in more advanced development stages move to FarGo, slightly
modifying our code.
Moreover, FarGo offers monitoring facilities for the mobile objects. This unique
monitoring feature allowed us to create an extensive testing and monitoring integrated
environment.
Binding and lookup features are also offered by FarGo. Objects can bind themselves
to a descriptive string, allowing other objects that search a specific service to be able
to find it by name.
As FarGo is to be used with Java, it is one of the more obvious choices for the
project.
2.1.5 Log4j
Much of the development time is spent during the application debugging
phase. A common debugging technique is to use on-screen printout. The developer
embeds print commands in certain methods to enabling to monitoring of the
23
application’s state. Printing the exact context under which the print occurs is a time-
consuming operation (from the developer’s view-point).
In a distributed application debugging, using this method naively would not be
productive. There are multiple, concurrent, printouts from different sources. It is hard
to follow all these printout, and it is even harder to try and synchronize the output,
interleaving the different sources.
Log4j [15] is the open source logging tool developed under the Jakarta Apache
project. It is a package designed to allow the creation of such logs for debugging
purposes. It offers a hierarchical way to insert logging statements within the Java
code. Multiple output formats and multiple levels of logging information are
available. It can also gather the print results from numerous sources, as this is the case
with distributed applications.
From the distributed application development point of view, using the Log4j package
made it possible for us to debug out distributed system using the printing technique.
Such ability was essential during the development of the project.
2.1.6 Ant
The Ant [24] tool is a java-based build tool. It has many characteristics that are
similar to those of the popular Make tool while offering a more flexible and rich
environment.
Ant can be extended using Java classes. The configuration files are XML based rather
than shell-command based.
2.1.7 JavaDoc
A good API documentation is vital for a long project development. JavaDoc [25] is a
tool that is used to automatically generate comfortable HTML view of the code, based
on tags that are added in the form of comments by the developer to the source code.
We use the JavaDoc tool to provide the final API documentation.
24
2.1.8 Jython
During the testing phase of the project, different complicated scenarios had to
be tested. The naïve choice for this kind of testing is to provide a special purpose
class, embodying each new scenario that should be tested. This technique imposes a
non-convenient testing environment as each test should be compiled, restarting the
entire system in order to perform the actual test.
An alternative would be to use an on-the-fly interacting scripting interface, allowing
the development to communicate with the application components at runtime.
Jython is a programming hybrid. It is an implementation of the Python scripting
language written in java. This interpreter is able to run under any compliant Java
virtual machine.
This scripting interface is used as a solution to out testing environment problem. One
is able to write complex tests, using the richness and simplicity of the python
language on one hand, and the application Java components themselves on the other.
This integrates environment allows a user to create a scenario on-the-fly, adapting the
test according to the dynamic behavior and state of the application.
Such a scripting interface can be used as a powerful monitoring and management tool
for an application. There could be cases in which one would like to modify the
application state in a way the designers did not think of without the need of actual
recompilation and application restart.
The Jython scripting interface that was integrated in the testing environment allowed a
flexible and productive test phase of the DEM system.
25
2.2 Design overview
In this section we will provide a design review of the distributed e-mail
(DEM) system. We start by reviewing the application goal and its base components. A
discussion regarding the system’s fault-tolerance and reduction of unneeded
information redundancy follows. We will continue by exploring each of the
components with a detailed description. A component description will include a
description of the contained data as well as a description of services provided by that
component.
The goal of the DEM system is to provide an e-mail system in which there is no
central location that stores the user’s mailbox. Preserving only a lightweight server for
mailbox address resolving and mailbox dispatching issues allows increase in
performance. In the suggested scheme, mail will travel directly between mailboxes,
which are located only on on-line clients, thus removing the bottleneck caused by
centralization in the old scheme.
Scalability is achieved by moving most of the system logic to the client’s side.
Increase in the number of clients will automatically suggest increase in the DEM
resources, thus performance will not be affected dramatically.
Reviewing DEM’s requirements and features suggests that the system has two main
actors: The mailboxes and the mailbox dispatch units.
From the mailing system client point of view, each client has a personal mailbox
(Mailbox). The personal mailbox is managed using a single per-online-client GUI
(MailboxGUI). As with most E-mail applications, each client has its personal address
book (AddressBook) that is used to store other client’s logical e-mail addresses.
A mail item (Mail) is the basic content unit that is sent from one mailbox to another.
This object contains similar data members as suggested by RFC 2822.
26
The goal of the mailbox dispatch unit (DispatchUnit) is to keep track of the mailbox
location and provide clients with the ability to locate other clients (mailboxes).
In order to provide the DEM system with the ability of space-sharing, mailbox pools
(MailboxPool) are available on all the on-line clients that are connected to the DEM
network. These pools function as containers for mailboxes of off-line users, keeping
track of local mailboxes. When a user becomes off-line, the corresponding pool is
emptied to other on-line pools, using the dispatch units to coordinate this task. The
dispatch units also keep track of the availability of pools and their location.
Figure 2: DEM overview
Mailbox Pool
Dispatch Unit
Client
Client
Off-lineMailbox
Mailbox Client
Off-lineMailbox GUI
Mailbox Pool
Client
Mailbox
Mailbox
ClientDispatch
Unit
Client
Mailbox Pool
Mailbox
Dispatch Unit GUI
27
Using the FarGo middleware, DEM is able to provide the ability to move mailboxes
from clients, which want to become off-line, to clients that are still on-line. FarGo’s
Location transparency makes the implementation this feature easy and
straightforward.
FarGo also allows DEM to regenerate dead parts of the application on live parts of
the network using simple interface operations. The evolution mechanism can also be
relatively easy to implement due to FarGo
28
2.2.1 Fault tolerance
In order to handle fault tolerance issues, DEM uses multiple backup
components of both the lightweight server side and the mailboxes (A method which is
known as fault tolerance by replications). When one component tries to reach another
component and finds it to be non-communicative (due to either a network delay or a
fault), the live component redirects its communication to a replica of the destination
component.
An example of this feature is apparent when a user tries to send a mail to a distant
mailbox. To retrieve a reference to the mail’s destination mailbox, the sending party
consults with a dispatch unit. Numerous inter-synchronized dispatch units can be
online. When the sending party can not communicate with one dispatch unit, it will
try to communicate with another dispatch unit. This sort of system recovery technique
increases the availability of the service, routing requests on any possible path in order
to try and provide the service.
When some component becomes off-line due to some crash, a replica of that
component should detect the fault and regenerate on parts of the DEM system that are
still alive. This technique is commonly referred to as a self-regenerating system.
As an example, let us examine the mailbox-pool as a fault point. To provide some
degree of mailbox fault tolerance, we suggest making a replica of a mailbox and
casting it to distant mailbox pools. In this context, regeneration means that a dead-
mailbox will be revived on live clients (live mailbox pools). A thread that is working
in the mailbox-pool awakes every time-interval, iterates on the locally available
mailboxes and invokes the “check replications” method. In case some mailbox is
missing, it is the job of the mailbox that discovered the fault to create a new mailbox,
initialize it and notify to other mailboxes for the event. That is the essence of the self-
regenerating systems.
In this scheme, it is easy observe that numerous mailboxes might attempt to detect the
fault and react with the creation of a new mailbox. Such a scenario is not resolved in
29
the current design. Nevertheless, a possible solution would be the termination of a
mailbox. One mailbox that sees that more then a fixed number of mailboxes
replications are available, communicates with another mailbox, informing the other
mailbox that it is planning to commit a termination. Upon the notification, the
mailbox terminates, and the notified mailbox is left to inform other mailboxes of the
change. This process continues until a certain pre-defined number of mailbox
replications are achieved. This sort of iterative process has a high probability for
convergence into the desired state.
A possible feature of the DEM system is evolution. In this context, evolution refers to
the ability of part of the network components to become another component or
dynamically add responsibilities. In the DEM case, clients might evolve from merely
mailbox clients to dispatch units. The evolution process was suggested as a solution
for numerous problems ranging from performance problems that might arise from
distant clients to fault tolerance issues (a server crash).
30
2.2.2 Reducing information redundancy
In the previous section we examined a good use of redundancy in an
application. The redundancy of the application’s components allowed a dynamic
reaction to fault, redirecting requests to a working replica.
In some cases though, redundancy of either information or functionality is the result
of poor design or merely other technical problem.
In the case of the mailing systems, such unwanted redundancy takes the form of a
mail attachment. Many mail servers tend to replicate the attachments, once for each
mail recipient. Although the information stored in a single attachment does not
change between recipients, the naïve mailing solution does not try to perform any
optimization.
More advance mailing systems address this problem by storing a single copy of the
attachment in a local data-base. These servers replace the attachments in the original
mail with a reference to the item which now relies in the data-base. By doing so, any
mail that was addressed to that server does not duplicate the attachment, saving a
considerable amount of space.
In the DEM system the solution is much simpler and straightforward. Each attachment
can be considered as a mobile object. A mail will now contain a reference to the
attachment rather than the attachment itself. When a mail is duplicated for numerous
recipients, the duplicated mails will contain a reference to the same attachment object
and not a replica of the attachment. By doing so, we can achieve the same space
optimization magnitude as with the data-base solution.
The attachments are always stored on on-line computers, much as mailboxes do. On a
computer shutdown, the local attachments are cast to other on-line computers,
preserving their availability.
31
2.2.3 Load balancing
Using a uniform distribution of mailboxes on the different on-line clients can
be used in order to achieve basic load balance.
For a more adaptive solution, implementing a monitoring mechanism on the
dispatcher side might yield better results. In the current design, a dispatcher has most
of the information needed for the load balancing of the mailboxes.
Another possible solution would be to place an active monitoring unit on the mailbox-
pool side. Locally, mailbox-pool can determine that a certain size boundary has been
crossed, causing the mailbox-pool to cast some mailboxes to other clients (through the
server).
Casting a mailbox means moving it to another computer that is running a local
instantiation of a mailbox pool. Before the actual mailbox movement takes place, the
mailbox un-registers itself from its current containing mailbox pool. Upon the
movement of the mailbox to the new computer, it registers itself onto the new local
mailbox pool.
In this version we did not implement any adaptive load balancing and yet the design
and implementation is currently oriented towards the first solution discussed above.
Location and relocation of servers can have a major effect on the system’s overall
performance. Moving the server according to communication statistics is a possibility.
Current design does not attempt to support this feature, and yet using the interactive
scripting and testing environment, one is able to both monitor and perform
movements of objects and manual load-balancing.
Preliminary examination of load-balancing techniques can be achieved through the
Jython scripting interface as well.
32
2.2.4 First Connection Problem
As with many distributed applications that supports intermittent nodes
connectivity, each application component that wishes to connect to the live
application network is faced with the first connection problem.
The first connection problem is that in order to connect to a network, a connecting
client must have an entry point. On the other hand, a pure decentralized system design
tends not to relay on any constantly connected nodes.
Some solutions for this problems relay on either a massive network search for other
connected nodes or on a node address cache. In the node address cache solution,
previously visited nodes (from previous sessions) are checked for aliveness, and if
available, they are used as network entry points. Each node’s local address cache is
refined during the session to permit it to be updated with current node addresses.
These new addresses have a better availability chance than old addresses.
Another naïve solution for the first connection problem is to leave a back-bone of
network components. These components are used only for resolving this initial
connection problem.
In the DEM system we chose to use that last solution, publishing a list of back-bone
dispatch units that are to be used by both new dispatch units and new mailbox clients.
33
2.3 Components description
In this section we will provide a description of DEM system components.
With each component we will go into a more detailed description of both the
component’s goal and its provided content and services.
2.3.1 Mail
Mail is the basic message unit that is transmitted from one Mailbox to another.
Like with standard mail, a Mail object contains a sender address, a receiver address, a
time stamp, a subject, and of course, content. This will suggest the following
methods:
setReceiver Sets the mail’s receiver which is some logical string address
setSender Sets the mail’s sender. Malicious users of dem can easily
exploit this, yet as stated before, security issues will not be
handled.
setDate Sets the mail’s sent time stamp that is resolved according to
sender’s time. Error in time accuracy due to the lack of time
synchronization between clients will not be regarded. Again,
malicious users might exploit this service to forge a sent date,
and again, this issue will not be handled.
setSubject Sets the e-mail’s subject, which is a string
setContent Sets the e-mail’s content, which is a string
getReceiver Retrieve the mail’s receiver which is some logical string
address
34
getSender Retrieves the mail’s sender.
getDate Retrieve the mail’s sent time stamp that was set according to
sender’s time.
getSubject Retrieve the e-mail’s subject, which is a string
getContent Retrieve the e-mail’s content, which is a string
isEquals A special predicate that indicates whether or not the given mail
is the same has this mail.
toString Method that is used to specially format the mail into a single
string.
One will be able to either construct an empty object setting its fields using the
described methods, or use fully/partially detailed constructors of a Mail object using
the mutators to later modify the fields.
The constructed mail is delivered to a specific Mailbox using the mailbox’s services.
The Mail can then be viewed using the MailboxGUI.
2.3.2 AddressBook
A personal address book is probably one of the most basic requirements of an
e-mail application. The DEM system was intended to provide a personal address book
along with each personal mailbox. The basic AddressBook version will provide the
ability to store and retrieve logical e-mail addresses. The address book will hold basic
personal information such as a client’s first and last name. These basic features yield
the following methods:
addPerson add a single person to the address book, which includes
a given name as well as it’s logical address.
35
getAddress retrieve a logical address of a person from the address
book according to a given key.
The address book will implement the Iterator interface to enable users to enumerate
on the clients registered in the address book.
Notice that we chose to implement this component at a later date, mostly due to fact it
has low research significance. A complete DEM client version will include an address
book attached to each personal mailbox. An address book GUI will be also supplied
in order to provide the user with the ability to view and modify the address book’s
content.
2.3.3 Mailbox
The mailbox is one of the most important components in DEM. Using this
component, mail items are retrieved, backed-up and travel to always stay on the live
parts of the network. Each mailbox is personal, thus it contains a specific user’s mails
as well as his/here personal address book.
A user should be able to use this component to send mail, read mail and delete a mail
item. A mailbox should be able the answer a ping-like call – a predicate that is used to
determine if a mailbox is still alive (an exception is thrown in case the mailbox does
not answer). This feature is intended to be used in the DEM fault tolerance and
regeneration mechanism. A mailbox should also be aware of on-line servers (dispatch
units) in order to be able to locate fellow mailboxes and enable the mailboxes travel
feature. A mailbox should also be aware of its replications in-order to examine their
activity as well as enforce mail synchronization. Concluding the mailbox’s features
results in the following methods:
getUserName Retrieve personal information regarding the user to whom the
mailbox belongs.
36
getAddrBook Retrieve the personal address book contained in the mailbox.
Notice that this method (and entire feature) is currently not
implemented.
isAlive A predicate that is used to indicate whether or not the
component is still alive. In case the component does not answer
the call, FarGo is responsible to the throw of an exception.
isActive A predicate that indicates whether or not this mailbox is
currently active, meaning that it is viewed by a used. A user
may connect to a non-active mailbox using a MailboxGUI
which uses this method to mark the mailbox as active.
setActive Used to mark a mailbox as active. In this context, active means
that the mailbox is currently been manipulated by an on-line
user through a MailboxGUI
regenrate Creates a copy of the mailbox and it’s mail on a different
MailboxPool. This can be a result of some distant mailbox
fault. Notice that the first version would not directly support
this method and feature.
getMail Get a specific Mail according to a key (index).
removeMail Remove a specific Mail from the mailbox. The mail can be
deleted either according to a special local mail index or by
passing a copy of the mail that is to be deleted.
addMail Adds a new mail to the mailbox. All mail information is
available inside the given Mail object. Synchronization with
mailbox replication will occur. The specific mail
synchronization is the responsibility of the first receiving
mailbox.
37
sendMail A method that is used to simulate the action of sending a mail.
The method was implemented for testing reasons only.
isEmpty A predicate that indicates whether or not the mailbox is empty
getMailCount Retrieve the number of mails that are contained in the mailbox
AddMailNoSync operates like the addMail method, but does not perform
further synchronization of this specific mail. This
method was intended to be implemented as part of the
mailbox replication mechanism. Currently this feature is
not supported.
getMailboxPool Each mailbox has a single containing mailbox pool.
Through this method one can retrieve the containing
mailbox pool.
setPool Using this method, one is able to set the containing
mailbox pool.
addMailboxListener Adds a listener to mailbox events (such as the arrival of
new mail, etc.)
removeMailboxListener Remove a listener from the list of mailbox
events listeners.
fireMailboxModifiedEvent A private method that is used to signal a
modification event in the mailbox. All registered
mailbox event listeners will be notified of the
event.
equals a predicate the checks the equality of this object
to a given object
38
postArrival The mailbox component is movement aware.
Upon the arrival of the mailbox to its new core,
it registers onto the locally available mailbox
pool.
toString A method that is used to format object unique
ID into a string.
checkReplications Check that all replications are alive. This feature
is not implemented in the first version.
registerReplication Used to register a replication of the mailbox in a
specific instantiation of the mailbox. As with all
other replication features on the mailbox side,
this method is currently not implemented.
unregisterReplication Used to un-register a replication that was found
to be dead by another mailbox
2.3.4 MailboxGUI
The purpose of the MailboxGUI is to provide a GUI interface for the DEM
system users. From the application layout point of view, the GUI drags its referred
mailbox to its current location in-order to improve performance. Regarding the
component’s requirements, a basic mail list view as well as the ability to create a new
mail and send them must be provided. Viewing the personal address book that is part
of each personal mailbox is also a basic feature that is to be provided as part of the
interface.
39
2.3.5 MailboxPool
At the heart of the DEM scheme lies the fact that only part of the clients are
on-line. Using these on-line clients as temporary storage space and mailbox handler
enables off-line clients’ mailboxes to be kept alive. To enable this core feature, a
mailbox pool component is a container for a specific client’s current mailboxes. This
means that each client holds numerous Mailbox objects of off-line users along with
his/here personal mailbox. The component should allow a user to examine its content,
retrieve, add and remove mailboxes. At a later version, a mailbox pool might be able
to provide monitoring services in-order to improve the load-balance on the on-line
clients. Currently, one can achieve this using the available scripting interface.
getDispatchUnit Choose a living dispatch unit randomly from the list of
available dispatch units. In case a dead dispatch unit is
encountered during the selection process, the dispatch unit list
is refined.
addDispatchUnit add a dispatch unit to the list of living dispatch units
removeDispatchUnit remove a given dispatch unit from the list of living dispatch
units.
getMailboxes Retrieve a list of the mailbox pool currently contained mailbox
objects.
castMailbox Casts a specific mailbox to another mailbox pool using services
provided by the dispatch servers. The implementation of this
feature is not required in the first version.
castMailboxes Empties the mailbox pool to other mailbox pool objects, again,
by using the services provided by the dispatch servers.
addMailbox adds a new mailbox to the pool.
40
removeMailbox remove a mailbox from the list of mailboxes that are currently
contained in the mailbox pool.
getNumMailboxes retrieve the number of mailboxes contained in the pool
notifyDUMailboxModification This method is used to propagate modification
of a single mailbox to at least one dispatch unit.
disconnectFromDispatchUnit This method is used to notify at least one
dispatch unit that the mailbox pool is about to be
de-actived.
isAlive A predicate that indicates whether or not this
component is alive. This method is used for
fault-tolerance purposes.
equals Used to indicate whether or not the given object
is the same as this mailbox pool.
toString Provides a conversion of the mailbox pool
unique ID into a formatted string
41
Figure 3: Client side class diagram
2.3.6 DispatchUnit
The dispatch unit role in the DEM system is to provide clients with the ability
to locate other mailboxes in their current location (somewhere inside the live parts of
the network). Retrieving a reference to a specific mailbox according to a logical
address is thus a basic service that must be provided by the dispatch server.
In order to allow this component to keep track with the mailbox constant relocation,
the dispatch unit provides an interface that must be used for relocation operation.
42
Due to fault tolerance issues, multiple dispatch servers exist, thus the synchronization
between these servers must also be handled.
The component should also allow a user to examine the dispatch unit’s known
reference list. This includes current on-line clients, registered mailboxes (and their
replications), on-line dispatch units and online mailbox pools.
Summarizing features into methods:
bindToCore Binds the dispatch unit to a special name on the current
containing core.
getUnitName Retrieve the name of the dispatch unit
createUser Creates a new user in the DEM system with a newly
constructed mailbox. The new mailbox is passed to a
mailbox pool right after construction.
doesUserExist A predicate that can be used to examine whether or not
a certain user exists in the system
getUserMailbox This method is at the core of the dispatch unit. It is used
to retrieve a reference to a mailbox according to the
mailbox’s owner name. It is used by distant users in the
process of sending a mail
getUser Get the name of a user according to an index
getNumUsers Get the number of currently registered users
getUserMap Returns a reference to the user map that is contained in
the dispatch unit. This data structure is used in the
process of dispatch units synchronization.
43
getFCList Using this method, one is able to retrieve the back-bone
dispatch units list. Using this list a new dispatch unit
and a connecting user are able to overcome the first
connection problem.
syncWithUserMap Used by the dispatch unit synchronization thread, this
method performs the synchronization between the local
user map and the distant user map that is passed as a
parameter.
getNumPools Retrieve the number of pools that are registered with the
dispatch unit.
addMailboxPool Add a mailbox pool reference to the list of mailbox
pools that are known to this dispatch unit
removeMailboxPool remove a mailbox pool from the list of known mailbox
pools.
getPool Retrieve a pool from the dispatch unit’s list of known
mailbox pools. The pool is retrieved according to a
given index
getPoolList Retrieve the entire list of known mailbox pools.
syncWithPools Used by the dispatch unit synchronization thread, the
method handles the synchronization of the known
mailbox pools list with a distant list (passed as a
parameter)
pickPool Randomly choose a pool from the list of known
mailbox pools. In case the selected pool is revealed as
dead, it is immediately removed from the list of known
mailbox pools and another pool is selected.
44
castMailbox Using this method, one is able to cast a mailbox from
it’s current mailbox pool to a different distant mailbox
pool. The new mailbox pool that will contain the given
mailbox is picked randomly from the list of known
mailbox pools.
getDispatchUnit retrieve a reference to a dispatch unit according to a
given index. All known dispatch units are held in a list
contained within each dispatch unit
addDispatchUnit add a dispatch unit to the list of known connected
dispatch units.
getDispatchUnitList Retrieve the entire list of known dispatch units. This is
used during the inter-dispatch-unit synchronization
process
getNumDispatchUnits Retrieve the number of known dispatch units
syncWithDUList This method is used by the dispatch unit
synchronization thread. Using this method the dispatch
unit is able to synchronize it’s list of known dispatch
unit with a distant list (passed as a parameter)
isAlive A predicate that indicates whether or not this
component is alive. In case the component will be
unavailable on the network, FarGo will cast an
exception, declaring this component as unreachable.
addDispatchUnitListener Add a listener to dispatch unit modification
events. This is used mostly by the dispatch unit
GUI component.
45
removeDispatchUnitListener Remove a listener of dispatch unit modification
event from the list of listeners.
fireMailboxModificationEvent This method is used by outside
components to notify the appropriate
listeners that a mailbox was modified.
fireDispatchUnitModified This method is used to signal all dispatch
unit event listeners that the dispatch unit
was modified. This method is used to
notify the dispatch unit GUI to update
and repaint.
equals This method is used to indicate whether
or not the given object equals this
dispatch unit
toString Used to format the object’s unique ID
supplied by FarGo to a printable string.
postArrival As this component is movement aware,
upon the arrival onto a new core, the
dispatch unit synchronization thread is
restarted to enable the inter-dispatch-unit
synchronization process to take place.
preDeparture Upon departing from the local core, the
dispatch unit terminated the
synchronization thread. This must be
done in order to preserve a correct state
of the application. The thread will be
reinitiated upon arrival to the new core.
46
setupFCList Using this method, the dispatch unit
builds the list of first connection nodes.
This list is used to resolve the first
connection problem as defined in
previous sections.
2.3.7 DispatchUnitGUI
Each dispatch unit can be viewed using the DispatchUnitGUI component.
According to the current design, a dispatch unit can reside on one core while the
viewing GUI may reside on another. Having a dispatch unit GUI connected to a
distant dispatch unit has a great advantage. A user can remotely monitor and interact
with any distant dispatch unit regardless of the user’s actual location.
Through this GUI a user may evaluate important information regarding the DEM
network current state:
Number of users
Connectivity of user to the net
Number of mailboxes available
Number of mailbox pools
Mailboxes layout on available mailbox pools
Number of connected dispatch units
2.3.8 DispatchUnitSyncThread
The dispatch unit synchronization thread class plays an important role in
making the dispatch units fault tolerant. In the DEM system we use replicas of the
dispatch unit in order to provide a fall-back solution in case of a fault in one of the
dispatch units.
47
In order to enable this solution, all of the existing dispatch units should be
synchronized.
The synchronization of the information between all dispatch units could be achieved
in one of two ways. Either we inform all dispatch units with any structural change as
they occur, or we accumulate that knowledge, propagating it to neighbor dispatch
units on almost constant time intervals.
In the DEM system we chose the second solution. Mailboxes, mailbox pools and other
dispatch units may join and/or leave the DEM network, notifying at least one dispatch
unit directly. This notification is used to update the local dispatch unit with the
change. It is the job of the dispatch unit synchronization thread to wake up on
constant interval and communicate the dispatch unit’s knowledge with all known
dispatch units.
The synchronization thread thus contains the following methods:
run According to the thread interface, this method is used to start
the periodic synchronization process. It runs in a loop,
performing the synchronization and sleeping for a constant time
period
syncDispatchUnits Performs the actual synchronization process. This
method passes through all known connected dispatch
units, synchronizing with distant dispatch units list,
distant mailbox pools list and user map. Distant
dispatch units that are found to be not available are
automatically removed from the dispatch unit’s list of
known dispatch units.
48
Figure 4: dispatch unit side class diagram
49
2.4 Application layout - Complet design
In this section we will describe the local-remote partitioning and mapping of the
distributed DEM application onto the physical set of network nodes.
At the heart of the FarGo network lays the Core concept. A Core is a unique object in
the FarGo network. It provides all the needed system support for the mobilization of
objects and their interconnection across distant machines.
As Core is a key element in the physical layer of the network, the Complet is the most
basic building block of the mobile application. The Complet defines the most minimal
unit of relocation. At all times, each Complet is associated with exactly one Core.
According to the reference rules imposed by FarGo, objects can reference either to
their containing Complets or to the anchor of other Complets.
When designing the layout of DEM application using FarGo’s terminology we result
in the following division into Complets:
Mailbox The mailbox should be able to move from Core to Core
in order to provide the most basic feature of the DEM
system – keeping the mailboxes and their contained
information alive.
MailboxGUI To enable mailbox managing from a distance, we chose
to define this component as a Complet as well.
Mailbox Pool Although this component stays on a single Core from
the moment it is created it had to be defined as a
Complet. As previously explained, this is due to the fact
other Complets needed the ability to reference the
mailbox pool (both mailboxes and dispatch units).
50
DispatchUnit The dispatch unit does not tend to move through the
system a lot, although, according to the design it should
be able to improve it’s location based on dynamic
location optimization opportunities. For this reason, and
due to the fact other Complets need to be able to point
to this component, we chose to define the Dispatch Unit
component as a Complet.
DispatchUnitGUI To enable distant monitoring and possible management
of the dispatch units, we choose to define the dispatch
unit’s GUI component as a Complet as well.
51
2.5 Inter-component communication
In this section we will review the inter-complet communication that passes
through the DEM system using FarGo’s middleware. Basically, all communications
between Complets does not require special action but rather they take place with
every distant method invocation, transparently.
Although we can mark the communication between the dispatch unit and its GUI (and
the mailbox and its GUI for that matter) as inter-complet communication, we choose
to focus on the application’s most important communications. These include the
process of sending a mail, the process of casting a mailbox and the synchronization of
the dispatch units.
2.5.1 Sending a mail
When a mailbox wishes to send a mail to some mailbox it should know the
address that represents the distant mailbox.
As previously defined, at all times, a mailbox is contained in a single mailbox pool. A
mailbox pool offers the service of finding a living dispatch unit. Using the available
reference to the containing mailbox pool, the user that intends to send the mail is able
to retrieve a reference to a living dispatch unit.
Next, a reference to the destination mailbox is retrieved from the dispatch unit using
the known destination mailbox address. The distant mailbox address string is used as
a search key (in the dispatch unit’s user/mailbox map).
At that point, the sending party holds a reference to the distant mailbox. Using the
addMail method, the new mail is added to the distant mailbox, thus completing the
mail sending process.
52
Figure 5: the process of sending a mail
2.5.2 Casting mailboxes
Casting a mailbox to another Core is a basic feature that must be implemented
in the DEM system.
The casting of mailbox from its containing Core/mailbox pool can be caused by a
load balancing mechanism or due to the fact the currently holding core performs a
shutdown.
In the case of a core shutdown, the contained mailbox pool needs to be emptied. In
order to perform that operation, the mailbox pool communicates with a dispatch unit,
which in turn performs the actual casts. When the dispatch unit performs the casting
operation it takes into account the fact distant pools might be currently disconnected
(it performs aliveness testing of distant pools). The dispatch unit also makes sure that
the new home of the mailbox is not the current mailbox pool.
The actual mailbox casting is performed by a dispatch unit to allow it to track the
mailbox’s location. This information is later propagated to the other dispatch units
(c) Get mailbox
(d) Add mail
(b) Is alive
(a) Get dispatch unitMailbox
GUIMailbox
Pool
Dispatch Unit
Mailbox(Recipient)
53
Figure 6: Casting a mailbox
Another possible origin for the movement of a mailbox is the connection of a mailbox
viewing component into the DEM network. In that case, during the login process, a
new local mailbox pool is created. The mailbox, retrieved from one of the available
dispatch units, is then moved to the newly created local mailbox pool.
Figure 7: Mailbox casting on login
(d) Cast
(b) Cast a mailboxMailbox
PoolDispatch
Unit
(a) Shutdown
event
(c) Is alive
Distant Mailbox
Pool
(c) Cast a mailboxMailbox
PoolDispatch
Unit
(a) Construct
(d) Cast
(b) Get mailbox Login
54
2.5.3 Synchronizing dispatch units
As explained in the components description section, a special purpose
synchronization thread is attached to each dispatch unit.
The synchronization thread wakes at constant time intervals communicating with all
known dispatch units. During the communication process, data structures in both the
local and distant dispatch units are updated according to their common knowledge.
The synchronized information includes the list of available dispatch units, list of
known mailbox pools and list of known users and mailboxes.
Figure 8: Dispatch unit synchronization process
(c) get DU list (h) Sync user m
ap
(e) get mailbox pools
(b) add dispatch unit
(a) isAlive
Dispatch Unit
Distant Dispatch Unit (d) Sync D
U
list
Sync. Thread
(f) Sync m
ailbox pools(g) get user map
55
3 Testing Environment
56
3.1 Overview
Testing a compound system such as the DEM system is a hard task. To
achieve an acceptable degree of software quality assurance, an extensive system test
must take place.
The tests may range from merely simple functionality testing of the different application
components, to inter-component interaction tests.
Functionality test includes the invocation of public available services of each component,
assuring the correct object state is retained after the service was invoked.
Testing the overall application’s correct state and behavior is a much harder task. This is
due to the fact the scenarios include inter-component relations. Although the set of
possible inter-object message is small, the compound and concurrent activity is hard to
fallow and debug.
We focus our testing on two basic approaches:
System stress tests that should assure correct behavior under large amounts of
communication between the system’s components.
Test scenarios in which faults occur on different system components, either local
one-time faults or multiple concurrent faults.
In order to provide a flexible environment for the development and execution of such
tests, we suggested the integrated testing environment.
In the design of the integrated environment we wished to have the following:
Core and Complet browser to enable the monitoring of all available
Complets on all of the registered Cores.
57
An interleaved view of the output of all components (including those
on a remote core)
A scripting interface that can be used to start the system, examine its
state and modify it (by invoking publicly available services) at
runtime
In order to provide the ability to monitor distant cores a special spy Complet was
implemented. Using the spy one is able to activate new components on a remote core
and also monitor the core’s activity, sending all information back to the central testing
environment.
Distant threads are used to collect all of the log outputs of the different system
components, directing all the information back to a central location – the integrated
testing environment. Special time stamps that are added to each log message can be
used to manually determine local ordering errors that might be caused by network or
operating system delays.
To enable a flexible convenient scripting interface, we had to choose a powerful
scripting language. Python [26], being a flexible interpreter that combines remarkable
power with a very clear syntax, was the best option that we could find.
In this project we chose to use the Jython interpreter. As explained in previous
sections, Jython is a Python interpreter written in Java. Using the integrated console, a
user can write python scripts that interact with the actual currently running objects.
The testing environment’s GUI includes all the described features, enabling a quick
review of the total system state, log area (for log based debugging and) and a scripting
console integrated in the environment for a convenient testing (snapshoot available at
appendix A: user manual).
As a special feature, this integrated testing environment may be used by DEM system
administrator. As an administrator running such a tool, one is able to influence
mailbox distribution and re-distribution. It is easy to setup new Cores and start a
distant dispatch unit running.
58
Future projects that wish to use the FarGo middleware will need some testing
environment. It is possible to make this testing environment generic enough to be
used by any other FarGo based application. It is also possible to simply adopt this
version to a new project.
59
3.2 Component description
In this section we will make a short review of the components the make out
the integrated testing environment.
3.2.1 Tester
The tester is the main object in the testing environment. Containing the testing
environment’s GUI, this object manages the Core/Complet browser. It also manages
the Jython console and the log console.
To enable the centralization and interleaving of the logs the tester offers logging
services.
In order to save development time and not implement already available components,
an external Jython console component was used – the SPyConsole [28].
All of the above features suggest the following methods
startConsole Activate the Jython console, intercepting all user
keystrokes to the script editor. After invocation of this
method, the application will not run any sequential code
(that appears after this method’s invocation) .
addLog Add a blue colored log to the environment logging area.
This blue color indicates that the message is a regular
message.
addLogError Add a red colored message to the environment logging
area. The red color indicate that the message is that of
an error.
60
addCore Add a new core to the list of active, monitored, cores.
This method receives not only the Core URL of the
newly created core, but also a reference to the distant
spy that resides on that core.
spyShutdown A method used by the spy to signal the tester that it is
about to be shutdown. The tester reacts with the
removal of the spy from its list of active spys.
modelChanged Used by other components (such as the spy) to signal
the testing environment that one of its monitored
components was somehow modified. Modification
includes the possibility a component moved from one
core to another.
3.2.2 Spy
At the core of the monitoring ability lays the spy Complet. On the creation of a
distant Core, a spy settles and registers itself to all possible Core events. These
include the event of Complet construction, destruction, arrival and departure. The
spy also listens to the Core shutdown event.
In order to provide only the Complet information that is relevant to the DEM
system, the spy is capable of filtering out non-DEM component activities. This
leaves a cleaner working environment as it comes to using the testing tool
effectively.
As part of being able to monitor the activity on the containing core, the spy offers
numerous component creation services. Using the spies, a user of the integrated
testing environment is able to create distant components such as a dispatch unit,
mailbox pool, online and offline users.
61
Summarizing all of these features into desired services suggests the following list
of methods:
registerOnCore register the spy on the distant core. This enables the spy
to monitor on core Complet activities and transmit the
acquired data to the central testing tool. The list of
events gathered was previously mentioned.
setTester setup the reference to the central tester object, which
runs the integrated testing environment
completConstructed intercept a complet construction event. Filter any non-
DEM complet creation.
completFreed intercept a complet destruction event. Filter any non-
DEM complet destruction.
completsDeparture intercept the departure of a set of Complets. This
method will filter out any non-DEM complet
movements.
completsArrived intercept the arrival of a set of Complets. This method
will filter out any non-DEM complet movements.
shutdown This method is used to make the containing core
perform a clean shutdown.
crash simulate a distant core crash event.
createDispatchUnitPack Creates a dispatch unit with a monitoring
dispatch unit GUI. A mailbox pool also comes
as part of this set of constructed components.
62
createNewOnlineUserPack This method is used to create a new online user
on the local core. The new user pack includes a
new mailbox, a monitoring mailbox GUI and a
mailbox pool.
createNewOfflineUserPack This method is used in order to create a new user
that is not currently viewed by a GUI. This
means that only a new mailbox is created. This
mailbox is automatically cast to some on-line
mailbox pool.
loginUser Tries to retrieve the mailbox of a user. A GUI is
then created to monitor the retrieved mailbox. A
new mailbox pool is also created such that it will
contain that retrieved mailbox.
getCompletNum retrieve the number of Complets which this spy
manages. Notice that the list contains only DEM related
components (such as mailboxes, dispatch units, etc.)
getComplet Retrieve a specific complet from the managed Complets
list. The complet is retrieved according to a given index.
getCoreURL retrieve the core URL which stores this spy
toString Returns a constant “Spy” string clearly stating the
functionality of this object.
equals this method is used in the comparison of this spy with
another given object. The unique complet ID is used in
the comparison.
fireChange Notify the central tester that the spy intercepted an
event that is related to one of DEM’s complets.
63
Figure 9: Tester class diagram
64
4 Future directions
One major issue that is not dealt with in the current DEM design is security.
There are a variety of security problems starting from forging a sender or mail time
stamp to reading someone’s personal mail. To make such an application usable
outside the lab, it is important to invest research time to resolve this problem. There
are many known techniques for the protection of the content against unauthorized
reading and/or modification of mail. Algorithms dealing with electronic signatures
[19, 20] are available and only small adjustments are needed in order to provide this
important feature.
To allow even greater flexibility of the DEM system, we can examine the possibility
of defining the mails themselves as complets. When a user becomes online and will
retrieve the mailbox, the mail and its content won’t automatically follow but rather
prefer to stay on the distant computer. This will reduce unneeded communication the
occurred when all the mails contained in a mailbox were transferred to the reader’s
computer.
Currently, the mailbox pools do not offer a dynamic load balancing. Balancing the
load on each DEM client, taking individual computer capabilities into account, would
improve the overall system performance.
To enable the inter-mail transfer between the DEM system and currently available
DNS based mailing services we suggested the construction of mailing bridges. From
one end these bridges will act as a normal mail server, receiving mails from the
“outside” world and pushing them into the DEM system. Mail from DEM directed to
normal servers will be routed through these bridges and on to their destination.
Although a decentralized design was one of the main motivations in this project, we
did not fully achieve this objective. As can be seen in the design, a central piece still
exists in the form of a dispatch unit. Although one might think such a central
component (or a variant of such a component) must exist in all electronic mailing
65
systems, we believe that by using a peer-to-peer protocol such as Pastry [3], a fully
decentralized system can be designed and implemented.
A problem that is apparent in the current design is that in order to send a mail to a
long mailing list, the sender must pay a processing time that is almost linear to the
number of recipients. To reduce the complexity of such an operation, it is possible to
enable users to define trusted users. These trusted users will aid with the distribution
of a mail to a large number of recipients. Assuming all the mail-recipients trusts the
sender, a logarithmic division of work can be achieved, reducing the complexity to
that of a O(log(n)).
Regarding the integrated testing environment that was developed during the project,
one can try to write a more generic version of that environment. The integrated
environment proved to be very useful for fast and easy creation of complex test
scenarios. It also proved to be useful as a monitoring and managing device for the
FarGo based application.
The testing environment should also be extended to allow it to attach spies to already
running distant cores. Such a feature will contribute to the effectiveness of the tool as
a debugger.
66
Appendix A: User manual
In this section we will review the different components available in the DEM system
from the user’s point of view. We will show how to start a dispatch unit, how to
create a new user in the system and how to login as a user of the system.
For the more advance user, we will demonstrate how to start the logging services and
how to work with integrated testing environment
67
Start logging services
In order for any of the DEM application components to work correctly, the
logging services must be started. This should occur prior to the creation of any other
DEM application component.
To restart the entire system including the log service, a user can use the restartLog
shell script. This script kills all currently running java application (including the
logging services). It then restarts the logging services.
Upon a successful execution of the logging services, a user is now able to start the
other DEM components. Notice that this application should be kept running as long as
the DEM system operates.
68
Dispatch Unit
In order to get a DEM network up and running at least one back-bone dispatch
unit should exist. A back-bone dispatch unit is one of the dispatch units available in
the DispatchUnit.list file. Using this list we are able to solve the first connection
problem, as discussed in previous sections.
To startup a dispatch unit, one can use the runServer shell script. This script receives
the name of the new core that will hold the dispatch unit. For example, the command
“runServer station1” will start a core name station1 on the local computer. On top of
that new core, a dispatch unit will be created.
A mailbox pool is automatically created along with the dispatch unit. This is done in
order to provide a preliminary location for the newly created mailboxes.
A dispatch unit monitoring GUI is also created. Using this GUI component, a user is
able to view the DEM network status. In the first top panel, green man icons indicate
online users (users that currently read and interact with their personal mailbox). In the
same panel, red man icons indicate that the users of that mailbox are currently off-
line. The address of each user is visible as a string next to the man icon.
In the middle panel one can view the currently active mailbox pools. The first column
shows an icon of a swimming pool indicating the existence of the mailbox pool. On
its right we will get either a blank icon or a mailbox icon. A blank icon indicates no
mailboxes currently reside on that mailbox pool. In case one or more mailboxes reside
on that mailbox pool, we will see the mailbox icon. The number between the braces
and next to that icon indicates the exact number of mailboxes that are currently
available inside that mailbox pool. A unique mailbox pool ID is available next to
these icons.
69
At the lowest panel we will get a list of currently connected dispatch units. Each entry
in that list displays a dispatch unit icon and a unique Complet ID representing that
dispatch unit.
Figure 10: Dispatch unit monitoring GUI
Closing the dispatch unit will cause a local core shutdown, after which the detection
of the shutdown event will propagate automatically in the graph of currently
connected dispatch units. Notice that at least a single backbone dispatch unit should
exist at all times, allowing users to login to the services and other dispatch units to
connect to the graph of dispatch units.
70
Mailbox client
A client that wishes to connect to the DEM network, retrieve the personal
mailbox and start working should perform a login procedure.
Although the login procedure does not include any authentication, a user that wishes
to log onto the system needs to supply a user name (the logical address of the
mailbox).
Upon successful login, a user can view and manipulate current mail through the main
mail GUI. When a user wishes to compose a new mail, a special purpose compose
mail GUI is created.
In the following sub-sections we will describe these components.
Login
To start a login procedure, a user can use the runLogin shell script. No
parameters are needed to be past to this script.
The login procedure starts with a login-address text field pop-up. This allows the user
to enter the address of the personal mailbox.
Figure 7: Login screen
Using the list of backbone dispatch units, we can resolve the first connection problem,
allowing the negotiation with distant dispatch units.
71
In case the desired mailbox does not exist, the user is prompted regarding the creation
of a new mailbox.
Figure 11: Create a new mail user
In case the user approves this action, a new mailbox is created on a distant core. Next,
a mailbox pool and mailbox GUI are created on the local core. The login process ends
when the distant mailbox comes to reside on the local core.
In case the desired mailbox does exist, we will get a new local mailbox pool and a
also a new mailbox GUI component viewing the retrieved mailbox.
Mail review and manipulation
Through this main mail client GUI, a user is able to view, create, forward,
reply and delete a mail. All of these actions are available as buttons in the upper most
part of the mail manipulation GUI.
Viewing a mail includes fields such as the sender, the subject and the sending date. All
of this information appears in a row in a table visible on the upper part of the GUI.
In the lower GUI section we will get a view of a selected mail content. Using scrolling
one can view the entire mail content.
72
Figure 12: mailbox view and control
The first left-most icon in the upper buttons area enables the user to create a new mail.
A discussion of the mail creation GUI is available in the following sub-section.
The GUI used for composing a new mail is also used when replying to old mail.
Composing a mail
The most intuitive GUI component in the system is the one responsible for the
creation of a new mail.
This GUI component is divided into four sections. Using the tab key a user can switch
between the available fields. The upper text field is that of the sender. This field is
automatically filled. Next is the receiver text field, after which the field of the subject
appears.
73
When possible, the compose mail component will try to automatically fill up the
available fields. For example, when a user replies to a mail, the sender and recipient
of the mail are simply switched.
The largest text area in this component is that of the content field. A scrollable text
area offers a convenient content editing area. At the bottom of that area lays the send
button which is used to complete the mail composition procedure.
Figure 13: Compose a new mail
74
Testing environment
As described in previous sections, an integrated testing environment is
available allowing advanced users to test the system is a comfortable way. This tool
may also be used a management tool for the DEM application.
To run the test environment, a user can invoke the runTest shell script. Notice that
before testing environment can be executed, the logging services must be available.
Figure 14: Integrated testing environment
Upon execution, a clear work space will appear. The browser tree will appear empty
on the left side of the frame. The larges middle piece is the scripting interface. At the
75
bottom of that frame one is able to view the interleaved color coded logs. Intuitive
scrolling is available for each one of these components
At the top of the frame there is a short menu used for scripting interface oriented
actions. The user is able to load scripts, edit and manipulate current scripts. Credit for
the creation of the Jython console component is available as part of that menu, under
“help”.
Regarding the browser tree, icons used in previously explained components are
available to visualize information regarding the complet type and location per
connected core. The intuitive tree-style browser allows the user to quickly explore
each of the currently connected cores.
76
Appendix B: Application Requirements
In order to run any of the DEM application components, the following list of
JAR files must be available:
DEM.jar Contains all the components that construct the
DEM network system. All code written during
the project is concentrated in this file, divided
into three packages that reside in this file:
Client, Server and Testing.
Fargo_wyaron.jar Contains all the infrastructure needed in order to
run the FarGo pre-compiler and use any of
FarGo’s facilities
Jython.jar Contains the java implementation of the python
scripting language. Must be available due to the
fact it is used in the testing environment.
JythonConsole In order to provide with a convenient console
that is integrated in the testing environment, this
JAR offers the needed GUI component including
its creation and handling.
log4-j-1.2.7.jar The centralized logging facility that is used
throughout the DEM application is available
based on the use of the log4j infrastructure. This
JAR provides all the needed classes for this
feature.
xmlParserAPIs.jar Although not directly needed by the DEM
application, this JAR is used by the ANT make
77
tool. We felt that leaving this library in the final
package will ease future development.
xercesImpl.jar As with the xmlParserAPIs.jar, this JAR is used
in the parsing of XML files, as it is done by the
ANT tool. We leave this JAR near for the same
reasons.
All of these mentioned JAR files should reside in a directory named lib under the
DEM home directory.
To allow further development we left other ant library files in the final package.
Among those are ant.jar, optional.jar, xercesImpl.jar and xml-apis.jar which should
reside in a directory named ant_lib which should reside inside the previously defined
lib directory.
As can be viewed in the screenshots, the DEM application provides an intuitive
interface which relays heavily on informative icons. All of the needed graphic images
reside in a directory names images inside the DEM home directory.
The final distribution of the DEM application is available in a file names DEM.jar.
This file must be available in order for any of the components to run. This JAR should
reside under the dist directory that should be found under the DEM home directory.
Examples for python test scripts should be available under the Tests directory located
in the DEM home directory.
All documentation, including the DEM API and this report document should be
available under the Documentation directory which resides under the DEM home
directory.
78
References
[1] Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience", Concurrency and Computation: Practice and Experience, 2004.
[2] Darleen Sadoski, “Client/Server Software Architectures – an Overview”, STR, 1997.
[3] Rowstron and P. Druschel. “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”. In Proc. IFIP/ACM Middleware 2001, Heidelberg, Germany, Nov. 2001.
[4] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan. “Chord: A Scalable Peer-to-peer Lookup Service for internet Applications”. In Proc. ACM SIGCOMM`01, San Diego, CA, Aug. 2001.
[5] S. Ratnasamy, P. Francis, M. Handley, R. Karp and S. Shenker. A scalable content-addressable network. In Proc ACM SIGCOMM`01, San Diego, CA, Aug 2001.
[6] Napster, http://www.napster.com/.[7] The Gnutella protocol specification 2000.
http://dss.clip2.com/GnutellaProtocol04.pdf.[8] Cockayne, W. R., and Zyda, M., Eds. 1998. Mobile Agents. Prentice Hall[9] T. Walsh, P. Nixon and S. Dobson. Review of the mobility
systems. Technical Report TCD-CS-2000-13, University of Dublin Trinity College, March 2000
[10] Ophir Holder, Israel Ben-Shaul and Hovav Gazit. “Dynamic Layout of Distributed Applications in FarGo”. Proceedings of the 21st International Conference on Software Engineering (ICSE'99), Los Angeles, CA, USA, May 1999.
[11] Ophit Holder, Hovav Gazit. “FarGo Programming Guide”. Technical report EE Pub 1194, Technion – Israel Institute of Technology.
[12] J. Hugunin, B. Warsaw, et al. The Jython/JPython Web Site. http://www.jython.org/
[13] James Gosling, Henry McGilton, “The Java Language Environment”. White paper, May 1996.
[14] Amy Fowler, “A Swing Architecture Overview”, http://java.sun.com/[15] Log4j, http://logging.apache.org/log4j[16] Douglas B. Terry. Managing updage conflicts in Bayou, a weakly connected
replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, December 1995.
[17] J. Demers, K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and B. B. Welch. Proceedings of the Workshop on Mobile Computing Systems and Applications, Santa Cruz, California, December 1994, pages 2-7.
[18] John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. “OceanStore: An Architecture for Global-Scale Persistent Storage“. Appears in Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), November 2000.
79
[19] R.L. Rivest, A. Shamir and L. Adleman. “A Method for Obtaining Digital Signatures and Public-Key Cryptosystem” Comm. ACM 21(2): 120-126 (1978).
[20] W. Diffie and M. Hellman. New Directions in Cryptography. IEEE Trans. Info. Theory 22(6): 644-654 (1976).
[21] Richard Golding and Elizabth Borowsky, “Fault-tolerant replication management in large-scale distributed storage systems”, In Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Oct 1999.
[22] Rachid Guerraoui and Andre Schiper, “Fault-Tolerance by Replication in Distributed Systems”, in Proc. Reliable Software Technologies Ada-Europe '96
[23] Eclipse, http://www.eclipse.org/[24] Ant, http://ant.apache.org/[25] JavaDoc, http://java.sun.com/j2se/javadoc/[26] Python, http://www.python.org/[27] Ian Clarke, “Freenet: A Distributed Anonymous Information Storage and
Retriveal System”. Division of Informatics, University of Edinburgh 1999[28] ftp://iee.umces.edu/SME3/JConsole/
80