part a - technion – israel institute of...

Technion, Israel Institute of Technology.

Software LaboratoryElectrical Engineering Department

Technion, Haifa, Israel

Bercovici Sivan

Instructor:Frishman Yaniv

Spring 2004

1

Distributed Electronic Mailing

System

PREFACE 1-4

1 INTRODUCTION 1-7

1.1 HISTORY 1-81.2 FARGO 1-101.3 PROJECT OBJECTIVE 1-111.3.1 CURRENT STATE 1-121.3.2 PROBLEM DEFINITION 1-131.3.3 SOLUTION OVERVIEW 1-151.4 COMPARISON 1-17

2 DESIGN AND IMPLEMENTATION DETAILS 2-20

2.1 TECHNOLOGY REVIEW 2-212.1.1 JAVA 2-212.1.2 ECLIPSE 2-212.1.3 SWING 2-222.1.4 FARGO 2-232.1.5 LOG4J 2-232.1.6 ANT 2-242.1.7 JAVADOC 2-242.1.8 JYTHON 2-252.2 DESIGN OVERVIEW 2-262.2.1 FAULT TOLERANCE 2-292.2.2 REDUCING INFORMATION REDUNDANCY 2-312.2.3 LOAD BALANCING 2-322.2.4 FIRST CONNECTION PROBLEM 2-332.3 COMPONENTS DESCRIPTION 2-342.3.1 MAIL 2-342.3.2 ADDRESSBOOK 2-352.3.3 MAILBOX 2-362.3.4 MAILBOXGUI 2-392.3.5 MAILBOXPOOL 2-402.3.6 DISPATCHUNIT 2-422.3.7 DISPATCHUNITGUI 2-472.3.8 DISPATCHUNITSYNCTHREAD 2-472.4 APPLICATION LAYOUT - COMPLET DESIGN 2-502.5 INTER-COMPONENT COMMUNICATION 2-522.5.1 SENDING A MAIL 2-522.5.2 CASTING MAILBOXES 2-532.5.3 SYNCHRONIZING DISPATCH UNITS 2-55

3 TESTING ENVIRONMENT 3-56

2

3.1 OVERVIEW 3-573.2 COMPONENT DESCRIPTION 3-603.2.1 TESTER 3-603.2.2 SPY 3-61

4 FUTURE DIRECTIONS 4-65

APPENDIX A: USER MANUAL 4-67

START LOGGING SERVICES 4-68DISPATCH UNIT 4-69MAILBOX CLIENT 4-71LOGIN 4-71MAIL REVIEW AND MANIPULATION 4-72COMPOSING A MAIL 4-73TESTING ENVIRONMENT 4-75

APPENDIX B: APPLICATION REQUIREMENTS 4-77

References 4-79

3

Preface

It is interesting to examine the evolution of software architecture concepts in

comparison to the evolution of mankind.

At the dawn of mankind, man relied mostly on his own work. The way man lived was

based mostly on his abilities as an individual. This resulted in low productivity and

low survival rate. One way to increase productivity was the crafting of designated

working tools for the different purposes. Nevertheless, the improvement gained from

using these tools was limited. This was due to fact work was still performed by each

man separately.

Experience with coordinated group work proved to be essential. It had a major impact

on both the productivity of the group as well as on the overall group survival rate. A

good example can be found in hunting, where joint effort lead to better results - a

greater catch. Dividing the work between group members based on the unique

member’s abilities was at the essence of man’s way of life. The male, been physically

stronger, dealt with hunting while the female nurtured the children.

Leaping thousands of years forward, knowledge became one of man’s most important

tools. From that end, sharing the knowledge globally allowed faster progress in all

fields of research. The curious mankind thrives on its new discoveries, as their rate

constantly increases.

At first glance, examining mankind as a fault-tolerance system might seem strange.

Still, one can easily mark key characteristics of a fault-tolerant system as they reveal

themselves in this biological system. Ranging from the survival of man to the survival

of man’s knowledge, “fault-tolerance by replication” [21, 22] plays an important role

in the confrontation with fatal disasters. The system’s inherent self-regeneration

characteristic allows it to converge to a steady-state after a dramatic “fault”.

Reviewing software’s shorter history reveals similar milestones. The first applications

(as with many applications today) performed most of the work on their own, utilizing

4

only the resources available on the computers on which they were ran. The use of

highly optimized libraries for common services increased the performance of

individual applications. The new applications incorporated the use of these libraries in

their own code.

In parallel to the development of better single-computer applications, a new

generation of distributed applications arose. The introduction of distributed systems

allowed an application to divide the work between numerous computers that

constructed a working network. In many cases, this coordinated work allowed a major

increase in performance. A network of computers used to perform such a distributed

task can be built up from various computers having different capabilities. The

productivity increase gained by optimizing the network to utilize each computer's

unique capabilities is significant, as demonstrated in some distributed working

environments, such as Condor [1]. Harnessing these computers in an optimal way

according to their capabilities proved to be rewarding.

One of the most widely used techniques against system fatal fault (a computer crash)

is the “fault-tolerance by replication” technique. What is inherently available in man

(and mankind in general) is somewhat imitated in the distributed software:

Replications of the software’s components are used as backup. When a certain

software component fails, it is replaced by its replica – an identical component that is

still active - allowing the entire system to continue with its normal work despite of

local faults.

The increased popularity of distributed applications pushed researchers to examine the

field of fault-tolerance in distributed systems, exploiting the new opportunities as they

arose. An example of such a technique is the self-regenerating system. If parts of the

distributed system fails, the software component which ran on the fault computers are

revived on working computers, allowing the distributed system to overcome the fault.

Decentralization is another characteristic that can be viewed in some human work

groups, as well as in several computer applications. Many companies, that used to

employ a strict organizational hierarchy, found out that eliminating it promotes direct

communication between various groups in the company, leading to increased

5

efficiency. Peer-To-Peer technology does the same for computer applications: by

using direct communication between working nodes (computers on the network), it

reduces the load on the central server - the most common bottleneck in such systems –

thus leading to better performance.

We can summarize this analogy with the conclusion that there are similar

characteristics in the evolution of man and the evolution of software, and that we can

expect even greater common characteristics between these two in the future.

Redesigning popular applications and computer services to a more "distributed" form

is rather challenging and has occupied the minds of many researchers and computer

engineers. The decomposition and re-composition of an application allowing such

work is difficult and most times not explicitly apparent.

In this project we will examine such a popular service – the e-mail system. Although

inherently distributed, making such a system decentralized is not trivial. The focus of

the work will be facing hard decentralization issues as well as fault-tolerance issues.

In the following chapters we will review a short history of network applications, the

state of e-mail systems today and a suggested novel distributed e-mail system. We

will examine the needed technology, review the new system’s design and examine

core concepts dealing with the fault-tolerance aspects of such systems.

6

1 Introduction

7

1.1 History

Since the beginning of the use of the Internet (and other networks at general),

there has been one dominant distributed software architecture, commonly known as

the Client/Server Software Architecture [2]. According to this paradigm, the

distributed software is separated into two communicating components. The roles of

each component were strictly defined, such that the client component requests a

service while the server component provides the service.

Peer-to-peer technology [3,4,5] (abbreviated P2P) suggested a different application

network layout. In this type of network each component has equivalent capabilities

and responsibilities. There exists no set of computers dedicated to serving the others,

but rather each component has the ability to both offer and request a service. A P2P

application will use a proprietary protocol in order to enable communication between

the network’s different components. The design and implementation of a P2P

application will take into account the fact each component that constructs the P2P

network is complete regardless of the other components (i.e. each client is self-

sufficient in the sense of application integrity). A P2P application is inherently much

more flexible, scalable and fault-tolerant than a classic network application, mostly

due to the fact the decentralization is at the core of its design.

Adding a user to a P2P system does not harm performance as happens in the more

centralized approach.

File-sharing applications based on P2P technology [6,7] offer some degree of fault-

tolerance in the sense of the availability of the shared files. As long as a certain

amount of users poses a certain file, other users are able to retrieve it (replication of

the resource).

The introduction of mobile-components distributed applications [8,9] allowed an even

more flexible and dynamic layout of the application’s components. In this scheme, all

of the components are part of a single application. These components can relocate to

computers that are connected to the network and continue their execution on the

8

remote computers. Communication between the different application’s components is

straightforward as if the components are all on a single computer. Relocation of

components on the live network enabled the exploitation of performance optimization

opportunities, as they arise in run-time. Fault-tolerance issues can be achieved by

replications of application’s components, enabling the application to revive to a legal

state using copies that lie on distant computers.

9

1.2 Fargo

FarGo [10, 11] is an example of a mobile-component-based distributed

applications development environment. FarGo allows the implementation of a

dynamic and adaptive application capable of working on large networks, over links

with varying capacities and computer abilities. Through the supplied middleware, one

is able to design an efficient and reliable distributed application capable of adapting to

the constantly changing network environment. At the core of FarGo lays the concept

of “dynamic application layout”, permitting the modification of a component’s

location at runtime.

FarGo is java based, which yields easy development of a cross-platform application.

Its unique capability to control relationships between moving objects allows a more

precise and rich application design.

Using FarGo does not imply major code changes but rather only a small set of

modifications are required to allow the application’s components to be mobile.

Implementing object transfer management logic using the FarGo infrastructure is

simple and direct.

10

1.3 Project objective

The objective of the project is to provide a novel electronic mailing system

that is decentralized and fault-tolerant. The project will attempt to provide a cross-

platform, lightweight, flexible, scalable and adaptive solution.

Similar work in this field of time-shared storage, as defined in the chord definition

paper [4], can be examined in projects such as Freenet [27], Bayou [16, 17] and

OceanStore [18].

Research during this project will focus on optimizing communication (based on a

decentralized scheme) and providing numerous fault recovery mechanisms.

Optimizations in the direction of data redundancy will also be examined.

In this research we examine the use of redundancy in a mobile-component distributed

application. We will also examine different optimization opportunities that are

available on a mobile-object design, such as use-rate based adaptive object movement

and communication bottleneck removal using direct communication between the

components.

As mentioned earlier, redundancy is widely used in order to provide fault-tolerance by

replications. A faulty component is backed-up by one of its replications. This suggests

that an increase in replication yields a more fault-tolerant application. From that end,

we will examine the use of replications as a fault-tolerance mechanism. We will take

a step toward the examination of a self-regenerating system. When a fault occurs in

such systems, the application will try to regenerate dead components, reviving them

on new computers.

There are cases in which the redundancy is not required but rather is inherent in an

application design. We will analyze an application example in an attempt to find a

11

redundancy optimization opportunity and try to solve it using a mobile component

approach.

We will supply a centralized testing environment that will allow run-time examination

of the system, as it is divided among different computers. A Jython [12] based

scripting interface will allow a simple and flexible testing environment for system.

1.3.1 Current state

The e-mail system actually pre-dates the Internet. As a matter of fact, e-mail

systems were a crucial tool in creating the Internet. Back in 1965, E-Mail started as a

way for multiple users of time-shared mainframe computers to communicate. What

was once that simple e-mail system evolved to a network email system. Users could

pass messages between different computers. The ARPANET computer network made

the e-mail application significantly more popular as it became one of its “killer apps”.

Due to the lack of direct inter-network connection of computers, an address-passing

list (“route”) between the computer of the sender and the computer of the receiver had

to be supplied. The e-mail has obtained the ability to pass between a number of

networks (such as ARPANET, BITNET and NSFNET).

In modern internet e-mail system, the e-mail is delivered directly to internet-

connected hosts. In most cases this is achieved using Domain Name System (DNS)

services and the simple mail transfer protocol (abbreviated SMTP).

The format of the modern internet e-mail message, as define in RFC 2822, consists of

two components: The header, component containing address information (sender and

receiver) and other information regarding the e-mail (such as subject and date), and

the body component containing the message itself.

The messages are exchanged between hosts using SMTP with mailing software (such

as Pine, Sendmail, etc.). Users download their personal messages from servers using

either the POP or IMAP protocols.

12

E-mail has been extended by the Multipurpose Internet Mail Extensions (MIME)

standard to allow the encoding of binary attachments to e-mails. Users were than able

to attach files (images, documents, etc.) to the e-mail they sent.

1.3.2 Problem definition

As described in the previous section, most modern e-mail systems are based

on a centralized approach in which users communicate with a central server in order

to retrieve their personal mail.

This sort of approach inherently suffers from key problems such as scalability, fault-

tolerance and various constrains on the system’s users.

In such a solution a bottleneck is evident at the server’s side. An increase in the

system’s number of users will increase the amount of communication that the mailing

server handles. This may cause fatal deterioration of the service availability to a point in

which the server is unable to provide the service at all.

Another inherent problem with the centralized solution is that of a failure at the server

(a crash). Any system that has such a core at its design will suffer from this sort of

problem. There could be scenarios in which due to a partial network disconnection or

delay, online clients can not reach the online server in order to retrieve and/or send

their mail.

In many popular mailing services (such as Hotmail) the user of the e-mail service is

limited to a certain space quota. The business model suggested that in order to get a

larger mailbox one should either pay or be forced to view commercial advertisements.

This model seems to impose unneeded boundaries on the system’s users.

One of the most major causes of individual quota problem and server space problems

are the use of attachments in e-mail. Some e-mail systems create a duplicate of the

13

attachment for each mail recipient. In the classical e-mail system structure, replication

of the attachment when sent to a distant server can not be avoided.

14

1.3.3 Solution overview

Nowadays, most e-mail services are provided using a single (or multiple) mail

servers that are used both as an address and a storage place for a user’s mailbox. This

centralized design might suffer from scalability issues (more users/larger mailbox),

throughput and latency due to the fact both sender and receiver of an e-mail perform

their work through a fixed set of mail servers. Fault-tolerance issues arise from the

fact all e-mails are stored in this fixed set of servers (which are in most cases on the

same geographical site).

A decentralized e-mail system (abbreviated DEM) will be implemented in order to

supply users with a simple, scalable, fault-tolerant mailing system.

Using FarGo as the development environment, we will design a decentralized mailing

system, providing answers to both performance and fault-tolerance issues. Using

personal traveling mailboxes which reside on on-line clients, most communication

will be done between mailboxes, thus removing much of the bottleneck that might be

caused by the mail server. FarGo will allow this using its reference tracing

mechanism. Fault-tolerance of mailboxes will be dealt by traveling backup mailboxes

that will scatter among the currently connected clients. Again, communication

between the mailboxes and their backup mailboxes (for synchronization) can be easily

implemented using the FarGo infrastructure. The server(s) goal in this scheme will be

to act as a reference gate for all the mailboxes. As these servers may experience

crashes, a special reference resolving is done on backup servers to enable fault-

tolerance at this point as well. The system’s components will try to detect local faults,

regenerate its dead components on on-line clients. This adaptive approach, commonly

known as a self-regenerating system, allows the convergence of the system into a

fixed legal state in which the degree of fault-tolerance is preserved.

15

Figure 1: DEM preliminary design overview

An email attachment is an example of unwanted redundancy. In classic mailing

systems, replications of the attachment are made for each recipient of the mail (this

includes both recipients of a long mailing list and recipients due to forwarding of a

mail). Defining the attachment as a mobile object enables a mail to point to the

attachment rather than hold a copy of it. FarGo’s reference transparency allows an

easy implementation of such a feature.

There are a few more advantages that exist in the suggested mail scheme, which are

freely gained. An example of such a gain is the somewhat protection against Spam

mail. In DEM, a client that wishes to send a mail to another client performs a certain

amount of operations per mail. This means that in order to send a large amount of e-

mails to many clients, the malicious sender will have to perform an amount of work

relative to the number of receivers. This can slow Spam process down and even make

it unfeasible. Current research in the field of distributed data mining can be used in

the DEM scheme to provide a more active protection against both Spam mail and

other vicious phenomenon.

Dispatch Units

Client

Client

Client

Client

Off-line

Mailbox

Mailbox

Mailbox

Mailbox

Send mail (b)

Get recipient mailbox pointer (a)

16

1.4 Comparison

Studying core characteristics of both currently available electronic mailing

systems and the suggested DEM system revealed differences that concentrate in three

fields: performance, scalability and fault-tolerance.

Performance can be measured in both time and space. Examining the communication

performance aspect, an inherent bottleneck problem is evident in the centralized

design of current email system, as apposed to the design DEM system. In a classical

email system, when a massive number of users try to send mails through the system, a

single server has to respond to all requests, spending most of its bandwidth to upload

and download mail content (including attachments).

On the other hand, in the DEM system a large portion of communication is based on

decentralization and peer-to-peer design. In this design, mail content is passed in a

peer-to-peer fashion, from a mailbox directly to a distant mailbox. Communication

with the dispatch unit is set to a minimum.

Many email systems do not provide any solution to the unneeded replications of a

mail’s attachment. Either when sent to a long mailing list or forwarded to new

recipients, an attachment gets replicated, wasting both storage space and

communication time. Some more advanced mailing systems offset a local

optimization for that problem. This is achieved by using a database to store a single

copy of an attachment. Any user inside that local system that will receive the mail

with this attachment will actually be given a reference to that single attachment copy

that is available in the data-base. In the DEM system, the design and implementation

of a floating attachment is straightforward. Attachments are considered mobile

objects. A single attachment object is created when a user wishes to add it to a mail.

Any sequential mail that passes that specific attachment will actually pass a reference

(and not a copy). By using this solution we will get a distributed optimization that

works on the entire DEM system.

17

Many centralized system designs suffer from inherent scalability problems. When the

number of service requests grows, the system deteriorates to a point of service

breakdown. Most solution for this problem tend to include enhancement of the center

piece either by using a stronger server or adding a few more computers to help with

the growing number of requests. In DEM, such a dramatic scalability problem is not

evident. This is mostly due to the fact a growing number of clients does not just mean

more request to serve but rather more shared resources of both computation and

storage.

Classic electronic mailing systems are sensitive to server fault. When the server

suffers from either a local fault or a network problem, none of the clients are able to

receive a new mail. The availability of the service is such system is thus easily

affected by very local problems. Modern solutions include several backup servers in

different location to allow continues service in case of some local fault. DEM, being

mostly decentralized, bypasses local faults by providing several dispatch units capable

of performing the needed tasks. We get a similar solution to the one available in

modern mailing system but with a much lower cost. Dispatch units can be

automatically created on any node in the DEM network, providing not only another

fail-safe point but also performance improvement. The performance improvement is

due to the fact requests will be handled by a larger number of dispatch units, reducing

the number of request handled by a single dispatch unit.

A fault in a server is not always due to some accidental problem. Nowadays, the

increasing number of electronic attacks threatens any service provided through the

internet, and the mailing service is not different in that matter. Malicious users of a

service know the exact address of the service providing computer. This starting point

is crucial to most service oriented attacks. As the DEM system lacks a single central

component, an attacker will find it hard to start his attack. Even if the attacker starts

with the backbone dispatch units, the self-regenerating and adaptive characteristics of

the DEM system would allow the other users to continue with their work,

uninterrupted by the fact part of the system is under attack.

Another major problem in the current mailing systems is the spam phenomenon. The

spam is junk mail send mostly to a long mailing list or newsgroup. Different solutions

18

exist to handle this problem, starting from personal mail filters to on-server solutions

that study the content of mails to block unwanted repeating mails. Judging from the

sending party, the advantage of this form of distribution is the low price of processing

time paid by the sending computer. DEM is inherently more protected against spam

due to the fact the sender pays computation time that is almost linear to the number of

mail recipients. This processing time includes the negotiation with a dispatch unit, and

the direct communication with each one of the recipient mailboxes.

19

2 Design and Implementation Details

20

2.1 Technology review

In this section we will review the technology used in the implementation of

the DEM system.

2.1.1 Java

Java [13] is a simple, object-oriented, architecture neutral, portable,

multithreaded programming language. When one wishes to create a portable

application, available on numerous platforms over a network, java is one of the most

obvious choices available today.

The simplicity of Java allows fast development of software, omitting C++ complex

features while adding important features such as the garbage collector.

Another important advantage is the availability of libraries in a wide range of areas

ranging from multimedia libraries to network and file system manipulation facilities.

The small footprint of the java libraries and small code fits with our motivation of

making a lightweight application, encouraging the use of the application.

2.1.2 Eclipse

The Eclipse [23] platform offers an integrated development environment

(IDE) for Java. We used this environment on the early stages of development.

Eclipse was designed as a platform for building IDE that can be used to create

applications ranging from web sites to C++ programs. In this project we use the Java

IDE developed under that platform.

21

The extensive amount of features such as the advanced debugging facilities, code re-

factoring abilities and incremental compilation feature allowed this product to take its

place as one of the leading development tools for the Java language.

Working with its informative errors and warning messages, quick fix-ups and

automatic code completion and generation allowed an even faster development.

As we finished the first phase of implementation and went on toward an extensive use

of FarGo we had to leave Eclipse behind. The reason was that the special tagging

required by the FarGo pre-compiler confused Eclipse’s auto-complete and automatic

error checking. At the more advanced stages of the project, we moved to develop

under xemacs.

2.1.3 Swing

Swing [14] is a Java library which contains a set of extensible GUI

component, enabling developers a more rapid development of powerful Java front

end.

The library is implemented entirely in Java, promoting cross-platform consistency and

easier maintenance. It provides the ability to easily modify the look-and-feel of the

GUI.

The Swing architecture follows the mode-view-controller (MVC) design. According

to the MVC architecture, the application is broken down into three separate parts: The

model that contains the data of the application, the view which visualizes this data and

the controller that intercepts user’s input, translating the actions into operations on the

model.

Swing provides compatibility with AWT APIs on overlapping areas.

Due to these reasons and the ease of Swing use, we choose to implement our GUI

using Swing as much as possible. Small sections of codes are implemented using the

AWT library, and only in cases where no suitable answer could be found on Swing.

22

2.1.4 FarGo

As presented in the introduction section, FarGo is a java-based programming

environment that is used in the development of mobile components distributed

applications.

A review of FarGo’s features revealed that it fits neatly into our suggested solution.

The ability to dynamically adjust the location of objects, preserving certain invariants

was essential in the design and available in FarGo.

The transparency of its working mechanism allowed us to concentrate more on the

development of the algorithmic side of the application rather than dealing with the

mobility technicalities. Also, the ease of converting currently coded classes to that of

a mobile object enabled us to start with a non-distributed solution that is much easier

to debug, and only in more advanced development stages move to FarGo, slightly

modifying our code.

Moreover, FarGo offers monitoring facilities for the mobile objects. This unique

monitoring feature allowed us to create an extensive testing and monitoring integrated

environment.

Binding and lookup features are also offered by FarGo. Objects can bind themselves

to a descriptive string, allowing other objects that search a specific service to be able

to find it by name.

As FarGo is to be used with Java, it is one of the more obvious choices for the

project.

2.1.5 Log4j

Much of the development time is spent during the application debugging

phase. A common debugging technique is to use on-screen printout. The developer

embeds print commands in certain methods to enabling to monitoring of the

23

application’s state. Printing the exact context under which the print occurs is a time-

consuming operation (from the developer’s view-point).

In a distributed application debugging, using this method naively would not be

productive. There are multiple, concurrent, printouts from different sources. It is hard

to follow all these printout, and it is even harder to try and synchronize the output,

interleaving the different sources.

Log4j [15] is the open source logging tool developed under the Jakarta Apache

project. It is a package designed to allow the creation of such logs for debugging

purposes. It offers a hierarchical way to insert logging statements within the Java

code. Multiple output formats and multiple levels of logging information are

available. It can also gather the print results from numerous sources, as this is the case

with distributed applications.

From the distributed application development point of view, using the Log4j package

made it possible for us to debug out distributed system using the printing technique.

Such ability was essential during the development of the project.

2.1.6 Ant

The Ant [24] tool is a java-based build tool. It has many characteristics that are

similar to those of the popular Make tool while offering a more flexible and rich

environment.

Ant can be extended using Java classes. The configuration files are XML based rather

than shell-command based.

2.1.7 JavaDoc

A good API documentation is vital for a long project development. JavaDoc [25] is a

tool that is used to automatically generate comfortable HTML view of the code, based

on tags that are added in the form of comments by the developer to the source code.

We use the JavaDoc tool to provide the final API documentation.

24

2.1.8 Jython

During the testing phase of the project, different complicated scenarios had to

be tested. The naïve choice for this kind of testing is to provide a special purpose

class, embodying each new scenario that should be tested. This technique imposes a

non-convenient testing environment as each test should be compiled, restarting the

entire system in order to perform the actual test.

An alternative would be to use an on-the-fly interacting scripting interface, allowing

the development to communicate with the application components at runtime.

Jython is a programming hybrid. It is an implementation of the Python scripting

language written in java. This interpreter is able to run under any compliant Java

virtual machine.

This scripting interface is used as a solution to out testing environment problem. One

is able to write complex tests, using the richness and simplicity of the python

language on one hand, and the application Java components themselves on the other.

This integrates environment allows a user to create a scenario on-the-fly, adapting the

test according to the dynamic behavior and state of the application.

Such a scripting interface can be used as a powerful monitoring and management tool

for an application. There could be cases in which one would like to modify the

application state in a way the designers did not think of without the need of actual

recompilation and application restart.

The Jython scripting interface that was integrated in the testing environment allowed a

flexible and productive test phase of the DEM system.

25

2.2 Design overview

In this section we will provide a design review of the distributed e-mail

(DEM) system. We start by reviewing the application goal and its base components. A

discussion regarding the system’s fault-tolerance and reduction of unneeded

information redundancy follows. We will continue by exploring each of the

components with a detailed description. A component description will include a

description of the contained data as well as a description of services provided by that

component.

The goal of the DEM system is to provide an e-mail system in which there is no

central location that stores the user’s mailbox. Preserving only a lightweight server for

mailbox address resolving and mailbox dispatching issues allows increase in

performance. In the suggested scheme, mail will travel directly between mailboxes,

which are located only on on-line clients, thus removing the bottleneck caused by

centralization in the old scheme.

Scalability is achieved by moving most of the system logic to the client’s side.

Increase in the number of clients will automatically suggest increase in the DEM

resources, thus performance will not be affected dramatically.

Reviewing DEM’s requirements and features suggests that the system has two main

actors: The mailboxes and the mailbox dispatch units.

From the mailing system client point of view, each client has a personal mailbox

(Mailbox). The personal mailbox is managed using a single per-online-client GUI

(MailboxGUI). As with most E-mail applications, each client has its personal address

book (AddressBook) that is used to store other client’s logical e-mail addresses.

A mail item (Mail) is the basic content unit that is sent from one mailbox to another.

This object contains similar data members as suggested by RFC 2822.

26

The goal of the mailbox dispatch unit (DispatchUnit) is to keep track of the mailbox

location and provide clients with the ability to locate other clients (mailboxes).

In order to provide the DEM system with the ability of space-sharing, mailbox pools

(MailboxPool) are available on all the on-line clients that are connected to the DEM

network. These pools function as containers for mailboxes of off-line users, keeping

track of local mailboxes. When a user becomes off-line, the corresponding pool is

emptied to other on-line pools, using the dispatch units to coordinate this task. The

dispatch units also keep track of the availability of pools and their location.

Figure 2: DEM overview

Mailbox Pool

Dispatch Unit

Client

Client

Off-lineMailbox

Mailbox Client

Off-lineMailbox GUI

Mailbox Pool

Client

Mailbox

Mailbox

ClientDispatch

Unit

Client

Mailbox Pool

Mailbox

Dispatch Unit GUI

27

Using the FarGo middleware, DEM is able to provide the ability to move mailboxes

from clients, which want to become off-line, to clients that are still on-line. FarGo’s

Location transparency makes the implementation this feature easy and

straightforward.

FarGo also allows DEM to regenerate dead parts of the application on live parts of

the network using simple interface operations. The evolution mechanism can also be

relatively easy to implement due to FarGo

28

2.2.1 Fault tolerance

In order to handle fault tolerance issues, DEM uses multiple backup

components of both the lightweight server side and the mailboxes (A method which is

known as fault tolerance by replications). When one component tries to reach another

component and finds it to be non-communicative (due to either a network delay or a

fault), the live component redirects its communication to a replica of the destination

component.

An example of this feature is apparent when a user tries to send a mail to a distant

mailbox. To retrieve a reference to the mail’s destination mailbox, the sending party

consults with a dispatch unit. Numerous inter-synchronized dispatch units can be

online. When the sending party can not communicate with one dispatch unit, it will

try to communicate with another dispatch unit. This sort of system recovery technique

increases the availability of the service, routing requests on any possible path in order

to try and provide the service.

When some component becomes off-line due to some crash, a replica of that

component should detect the fault and regenerate on parts of the DEM system that are

still alive. This technique is commonly referred to as a self-regenerating system.

As an example, let us examine the mailbox-pool as a fault point. To provide some

degree of mailbox fault tolerance, we suggest making a replica of a mailbox and

casting it to distant mailbox pools. In this context, regeneration means that a dead-

mailbox will be revived on live clients (live mailbox pools). A thread that is working

in the mailbox-pool awakes every time-interval, iterates on the locally available

mailboxes and invokes the “check replications” method. In case some mailbox is

missing, it is the job of the mailbox that discovered the fault to create a new mailbox,

initialize it and notify to other mailboxes for the event. That is the essence of the self-

regenerating systems.

In this scheme, it is easy observe that numerous mailboxes might attempt to detect the

fault and react with the creation of a new mailbox. Such a scenario is not resolved in

29

the current design. Nevertheless, a possible solution would be the termination of a

mailbox. One mailbox that sees that more then a fixed number of mailboxes

replications are available, communicates with another mailbox, informing the other

mailbox that it is planning to commit a termination. Upon the notification, the

mailbox terminates, and the notified mailbox is left to inform other mailboxes of the

change. This process continues until a certain pre-defined number of mailbox

replications are achieved. This sort of iterative process has a high probability for

convergence into the desired state.

A possible feature of the DEM system is evolution. In this context, evolution refers to

the ability of part of the network components to become another component or

dynamically add responsibilities. In the DEM case, clients might evolve from merely

mailbox clients to dispatch units. The evolution process was suggested as a solution

for numerous problems ranging from performance problems that might arise from

distant clients to fault tolerance issues (a server crash).

30

2.2.2 Reducing information redundancy

In the previous section we examined a good use of redundancy in an

application. The redundancy of the application’s components allowed a dynamic

reaction to fault, redirecting requests to a working replica.

In some cases though, redundancy of either information or functionality is the result

of poor design or merely other technical problem.

In the case of the mailing systems, such unwanted redundancy takes the form of a

mail attachment. Many mail servers tend to replicate the attachments, once for each

mail recipient. Although the information stored in a single attachment does not

change between recipients, the naïve mailing solution does not try to perform any

optimization.

More advance mailing systems address this problem by storing a single copy of the

attachment in a local data-base. These servers replace the attachments in the original

mail with a reference to the item which now relies in the data-base. By doing so, any

mail that was addressed to that server does not duplicate the attachment, saving a

considerable amount of space.

In the DEM system the solution is much simpler and straightforward. Each attachment

can be considered as a mobile object. A mail will now contain a reference to the

attachment rather than the attachment itself. When a mail is duplicated for numerous

recipients, the duplicated mails will contain a reference to the same attachment object

and not a replica of the attachment. By doing so, we can achieve the same space

optimization magnitude as with the data-base solution.

The attachments are always stored on on-line computers, much as mailboxes do. On a

computer shutdown, the local attachments are cast to other on-line computers,

preserving their availability.

31

2.2.3 Load balancing

Using a uniform distribution of mailboxes on the different on-line clients can

be used in order to achieve basic load balance.

For a more adaptive solution, implementing a monitoring mechanism on the

dispatcher side might yield better results. In the current design, a dispatcher has most

of the information needed for the load balancing of the mailboxes.

Another possible solution would be to place an active monitoring unit on the mailbox-

pool side. Locally, mailbox-pool can determine that a certain size boundary has been

crossed, causing the mailbox-pool to cast some mailboxes to other clients (through the

server).

Casting a mailbox means moving it to another computer that is running a local

instantiation of a mailbox pool. Before the actual mailbox movement takes place, the

mailbox un-registers itself from its current containing mailbox pool. Upon the

movement of the mailbox to the new computer, it registers itself onto the new local

mailbox pool.

In this version we did not implement any adaptive load balancing and yet the design

and implementation is currently oriented towards the first solution discussed above.

Location and relocation of servers can have a major effect on the system’s overall

performance. Moving the server according to communication statistics is a possibility.

Current design does not attempt to support this feature, and yet using the interactive

scripting and testing environment, one is able to both monitor and perform

movements of objects and manual load-balancing.

Preliminary examination of load-balancing techniques can be achieved through the

Jython scripting interface as well.

32

2.2.4 First Connection Problem

As with many distributed applications that supports intermittent nodes

connectivity, each application component that wishes to connect to the live

application network is faced with the first connection problem.

The first connection problem is that in order to connect to a network, a connecting

client must have an entry point. On the other hand, a pure decentralized system design

tends not to relay on any constantly connected nodes.

Some solutions for this problems relay on either a massive network search for other

connected nodes or on a node address cache. In the node address cache solution,

previously visited nodes (from previous sessions) are checked for aliveness, and if

available, they are used as network entry points. Each node’s local address cache is

refined during the session to permit it to be updated with current node addresses.

These new addresses have a better availability chance than old addresses.

Another naïve solution for the first connection problem is to leave a back-bone of

network components. These components are used only for resolving this initial

connection problem.

In the DEM system we chose to use that last solution, publishing a list of back-bone

dispatch units that are to be used by both new dispatch units and new mailbox clients.

33

2.3 Components description

In this section we will provide a description of DEM system components.

With each component we will go into a more detailed description of both the

component’s goal and its provided content and services.

2.3.1 Mail

Mail is the basic message unit that is transmitted from one Mailbox to another.

Like with standard mail, a Mail object contains a sender address, a receiver address, a

time stamp, a subject, and of course, content. This will suggest the following

methods:

setReceiver Sets the mail’s receiver which is some logical string address

setSender Sets the mail’s sender. Malicious users of dem can easily

exploit this, yet as stated before, security issues will not be

handled.

setDate Sets the mail’s sent time stamp that is resolved according to

sender’s time. Error in time accuracy due to the lack of time

synchronization between clients will not be regarded. Again,

malicious users might exploit this service to forge a sent date,

and again, this issue will not be handled.

setSubject Sets the e-mail’s subject, which is a string

setContent Sets the e-mail’s content, which is a string

getReceiver Retrieve the mail’s receiver which is some logical string

address

34

getSender Retrieves the mail’s sender.

getDate Retrieve the mail’s sent time stamp that was set according to

sender’s time.

getSubject Retrieve the e-mail’s subject, which is a string

getContent Retrieve the e-mail’s content, which is a string

isEquals A special predicate that indicates whether or not the given mail

is the same has this mail.

toString Method that is used to specially format the mail into a single

string.

One will be able to either construct an empty object setting its fields using the

described methods, or use fully/partially detailed constructors of a Mail object using

the mutators to later modify the fields.

The constructed mail is delivered to a specific Mailbox using the mailbox’s services.

The Mail can then be viewed using the MailboxGUI.

2.3.2 AddressBook

A personal address book is probably one of the most basic requirements of an

e-mail application. The DEM system was intended to provide a personal address book

along with each personal mailbox. The basic AddressBook version will provide the

ability to store and retrieve logical e-mail addresses. The address book will hold basic

personal information such as a client’s first and last name. These basic features yield

the following methods:

addPerson add a single person to the address book, which includes

a given name as well as it’s logical address.

35

getAddress retrieve a logical address of a person from the address

book according to a given key.

The address book will implement the Iterator interface to enable users to enumerate

on the clients registered in the address book.

Notice that we chose to implement this component at a later date, mostly due to fact it

has low research significance. A complete DEM client version will include an address

book attached to each personal mailbox. An address book GUI will be also supplied

in order to provide the user with the ability to view and modify the address book’s

content.

2.3.3 Mailbox

The mailbox is one of the most important components in DEM. Using this

component, mail items are retrieved, backed-up and travel to always stay on the live

parts of the network. Each mailbox is personal, thus it contains a specific user’s mails

as well as his/here personal address book.

A user should be able to use this component to send mail, read mail and delete a mail

item. A mailbox should be able the answer a ping-like call – a predicate that is used to

determine if a mailbox is still alive (an exception is thrown in case the mailbox does

not answer). This feature is intended to be used in the DEM fault tolerance and

regeneration mechanism. A mailbox should also be aware of on-line servers (dispatch

units) in order to be able to locate fellow mailboxes and enable the mailboxes travel

feature. A mailbox should also be aware of its replications in-order to examine their

activity as well as enforce mail synchronization. Concluding the mailbox’s features

results in the following methods:

getUserName Retrieve personal information regarding the user to whom the

mailbox belongs.

36

getAddrBook Retrieve the personal address book contained in the mailbox.

Notice that this method (and entire feature) is currently not

implemented.

isAlive A predicate that is used to indicate whether or not the

component is still alive. In case the component does not answer

the call, FarGo is responsible to the throw of an exception.

isActive A predicate that indicates whether or not this mailbox is

currently active, meaning that it is viewed by a used. A user

may connect to a non-active mailbox using a MailboxGUI

which uses this method to mark the mailbox as active.

setActive Used to mark a mailbox as active. In this context, active means

that the mailbox is currently been manipulated by an on-line

user through a MailboxGUI

regenrate Creates a copy of the mailbox and it’s mail on a different

MailboxPool. This can be a result of some distant mailbox

fault. Notice that the first version would not directly support

this method and feature.

getMail Get a specific Mail according to a key (index).

removeMail Remove a specific Mail from the mailbox. The mail can be

deleted either according to a special local mail index or by

passing a copy of the mail that is to be deleted.

addMail Adds a new mail to the mailbox. All mail information is

available inside the given Mail object. Synchronization with

mailbox replication will occur. The specific mail

synchronization is the responsibility of the first receiving

mailbox.

37

sendMail A method that is used to simulate the action of sending a mail.

The method was implemented for testing reasons only.

isEmpty A predicate that indicates whether or not the mailbox is empty

getMailCount Retrieve the number of mails that are contained in the mailbox

AddMailNoSync operates like the addMail method, but does not perform

further synchronization of this specific mail. This

method was intended to be implemented as part of the

mailbox replication mechanism. Currently this feature is

not supported.

getMailboxPool Each mailbox has a single containing mailbox pool.

Through this method one can retrieve the containing

mailbox pool.

setPool Using this method, one is able to set the containing

mailbox pool.

addMailboxListener Adds a listener to mailbox events (such as the arrival of

new mail, etc.)

removeMailboxListener Remove a listener from the list of mailbox

events listeners.

fireMailboxModifiedEvent A private method that is used to signal a

modification event in the mailbox. All registered

mailbox event listeners will be notified of the

event.

equals a predicate the checks the equality of this object

to a given object

38

postArrival The mailbox component is movement aware.

Upon the arrival of the mailbox to its new core,

it registers onto the locally available mailbox

pool.

toString A method that is used to format object unique

ID into a string.

checkReplications Check that all replications are alive. This feature

is not implemented in the first version.

registerReplication Used to register a replication of the mailbox in a

specific instantiation of the mailbox. As with all

other replication features on the mailbox side,

this method is currently not implemented.

unregisterReplication Used to un-register a replication that was found

to be dead by another mailbox

2.3.4 MailboxGUI

The purpose of the MailboxGUI is to provide a GUI interface for the DEM

system users. From the application layout point of view, the GUI drags its referred

mailbox to its current location in-order to improve performance. Regarding the

component’s requirements, a basic mail list view as well as the ability to create a new

mail and send them must be provided. Viewing the personal address book that is part

of each personal mailbox is also a basic feature that is to be provided as part of the

interface.

39

2.3.5 MailboxPool

At the heart of the DEM scheme lies the fact that only part of the clients are

on-line. Using these on-line clients as temporary storage space and mailbox handler

enables off-line clients’ mailboxes to be kept alive. To enable this core feature, a

mailbox pool component is a container for a specific client’s current mailboxes. This

means that each client holds numerous Mailbox objects of off-line users along with

his/here personal mailbox. The component should allow a user to examine its content,

retrieve, add and remove mailboxes. At a later version, a mailbox pool might be able

to provide monitoring services in-order to improve the load-balance on the on-line

clients. Currently, one can achieve this using the available scripting interface.

getDispatchUnit Choose a living dispatch unit randomly from the list of

available dispatch units. In case a dead dispatch unit is

encountered during the selection process, the dispatch unit list

is refined.

addDispatchUnit add a dispatch unit to the list of living dispatch units

removeDispatchUnit remove a given dispatch unit from the list of living dispatch

units.

getMailboxes Retrieve a list of the mailbox pool currently contained mailbox

objects.

castMailbox Casts a specific mailbox to another mailbox pool using services

provided by the dispatch servers. The implementation of this

feature is not required in the first version.

castMailboxes Empties the mailbox pool to other mailbox pool objects, again,

by using the services provided by the dispatch servers.

addMailbox adds a new mailbox to the pool.

40

removeMailbox remove a mailbox from the list of mailboxes that are currently

contained in the mailbox pool.

getNumMailboxes retrieve the number of mailboxes contained in the pool

notifyDUMailboxModification This method is used to propagate modification

of a single mailbox to at least one dispatch unit.

disconnectFromDispatchUnit This method is used to notify at least one

dispatch unit that the mailbox pool is about to be

de-actived.

isAlive A predicate that indicates whether or not this

component is alive. This method is used for

fault-tolerance purposes.

equals Used to indicate whether or not the given object

is the same as this mailbox pool.

toString Provides a conversion of the mailbox pool

unique ID into a formatted string

41

Figure 3: Client side class diagram

2.3.6 DispatchUnit

The dispatch unit role in the DEM system is to provide clients with the ability

to locate other mailboxes in their current location (somewhere inside the live parts of

the network). Retrieving a reference to a specific mailbox according to a logical

address is thus a basic service that must be provided by the dispatch server.

In order to allow this component to keep track with the mailbox constant relocation,

the dispatch unit provides an interface that must be used for relocation operation.

42

Due to fault tolerance issues, multiple dispatch servers exist, thus the synchronization

between these servers must also be handled.

The component should also allow a user to examine the dispatch unit’s known

reference list. This includes current on-line clients, registered mailboxes (and their

replications), on-line dispatch units and online mailbox pools.

Summarizing features into methods:

bindToCore Binds the dispatch unit to a special name on the current

containing core.

getUnitName Retrieve the name of the dispatch unit

createUser Creates a new user in the DEM system with a newly

constructed mailbox. The new mailbox is passed to a

mailbox pool right after construction.

doesUserExist A predicate that can be used to examine whether or not

a certain user exists in the system

getUserMailbox This method is at the core of the dispatch unit. It is used

to retrieve a reference to a mailbox according to the

mailbox’s owner name. It is used by distant users in the

process of sending a mail

getUser Get the name of a user according to an index

getNumUsers Get the number of currently registered users

getUserMap Returns a reference to the user map that is contained in

the dispatch unit. This data structure is used in the

process of dispatch units synchronization.

43

getFCList Using this method, one is able to retrieve the back-bone

dispatch units list. Using this list a new dispatch unit

and a connecting user are able to overcome the first

connection problem.

syncWithUserMap Used by the dispatch unit synchronization thread, this

method performs the synchronization between the local

user map and the distant user map that is passed as a

parameter.

getNumPools Retrieve the number of pools that are registered with the

dispatch unit.

addMailboxPool Add a mailbox pool reference to the list of mailbox

pools that are known to this dispatch unit

removeMailboxPool remove a mailbox pool from the list of known mailbox

pools.

getPool Retrieve a pool from the dispatch unit’s list of known

mailbox pools. The pool is retrieved according to a

given index

getPoolList Retrieve the entire list of known mailbox pools.

syncWithPools Used by the dispatch unit synchronization thread, the

method handles the synchronization of the known

mailbox pools list with a distant list (passed as a

parameter)

pickPool Randomly choose a pool from the list of known

mailbox pools. In case the selected pool is revealed as

dead, it is immediately removed from the list of known

mailbox pools and another pool is selected.

44

castMailbox Using this method, one is able to cast a mailbox from

it’s current mailbox pool to a different distant mailbox

pool. The new mailbox pool that will contain the given

mailbox is picked randomly from the list of known

mailbox pools.

getDispatchUnit retrieve a reference to a dispatch unit according to a

given index. All known dispatch units are held in a list

contained within each dispatch unit

addDispatchUnit add a dispatch unit to the list of known connected

dispatch units.

getDispatchUnitList Retrieve the entire list of known dispatch units. This is

used during the inter-dispatch-unit synchronization

process

getNumDispatchUnits Retrieve the number of known dispatch units

syncWithDUList This method is used by the dispatch unit

synchronization thread. Using this method the dispatch

unit is able to synchronize it’s list of known dispatch

unit with a distant list (passed as a parameter)

isAlive A predicate that indicates whether or not this

component is alive. In case the component will be

unavailable on the network, FarGo will cast an

exception, declaring this component as unreachable.

addDispatchUnitListener Add a listener to dispatch unit modification

events. This is used mostly by the dispatch unit

GUI component.

45

removeDispatchUnitListener Remove a listener of dispatch unit modification

event from the list of listeners.

fireMailboxModificationEvent This method is used by outside

components to notify the appropriate

listeners that a mailbox was modified.

fireDispatchUnitModified This method is used to signal all dispatch

unit event listeners that the dispatch unit

was modified. This method is used to

notify the dispatch unit GUI to update

and repaint.

equals This method is used to indicate whether

or not the given object equals this

dispatch unit

toString Used to format the object’s unique ID

supplied by FarGo to a printable string.

postArrival As this component is movement aware,

upon the arrival onto a new core, the

dispatch unit synchronization thread is

restarted to enable the inter-dispatch-unit

synchronization process to take place.

preDeparture Upon departing from the local core, the

dispatch unit terminated the

synchronization thread. This must be

done in order to preserve a correct state

of the application. The thread will be

reinitiated upon arrival to the new core.

46

setupFCList Using this method, the dispatch unit

builds the list of first connection nodes.

This list is used to resolve the first

connection problem as defined in

previous sections.

2.3.7 DispatchUnitGUI

Each dispatch unit can be viewed using the DispatchUnitGUI component.

According to the current design, a dispatch unit can reside on one core while the

viewing GUI may reside on another. Having a dispatch unit GUI connected to a

distant dispatch unit has a great advantage. A user can remotely monitor and interact

with any distant dispatch unit regardless of the user’s actual location.

Through this GUI a user may evaluate important information regarding the DEM

network current state:

Number of users

Connectivity of user to the net

Number of mailboxes available

Number of mailbox pools

Mailboxes layout on available mailbox pools

Number of connected dispatch units

2.3.8 DispatchUnitSyncThread

The dispatch unit synchronization thread class plays an important role in

making the dispatch units fault tolerant. In the DEM system we use replicas of the

dispatch unit in order to provide a fall-back solution in case of a fault in one of the

dispatch units.

47

In order to enable this solution, all of the existing dispatch units should be

synchronized.

The synchronization of the information between all dispatch units could be achieved

in one of two ways. Either we inform all dispatch units with any structural change as

they occur, or we accumulate that knowledge, propagating it to neighbor dispatch

units on almost constant time intervals.

In the DEM system we chose the second solution. Mailboxes, mailbox pools and other

dispatch units may join and/or leave the DEM network, notifying at least one dispatch

unit directly. This notification is used to update the local dispatch unit with the

change. It is the job of the dispatch unit synchronization thread to wake up on

constant interval and communicate the dispatch unit’s knowledge with all known

dispatch units.

The synchronization thread thus contains the following methods:

run According to the thread interface, this method is used to start

the periodic synchronization process. It runs in a loop,

performing the synchronization and sleeping for a constant time

period

syncDispatchUnits Performs the actual synchronization process. This

method passes through all known connected dispatch

units, synchronizing with distant dispatch units list,

distant mailbox pools list and user map. Distant

dispatch units that are found to be not available are

automatically removed from the dispatch unit’s list of

known dispatch units.

48

Figure 4: dispatch unit side class diagram

49

2.4 Application layout - Complet design

In this section we will describe the local-remote partitioning and mapping of the

distributed DEM application onto the physical set of network nodes.

At the heart of the FarGo network lays the Core concept. A Core is a unique object in

the FarGo network. It provides all the needed system support for the mobilization of

objects and their interconnection across distant machines.

As Core is a key element in the physical layer of the network, the Complet is the most

basic building block of the mobile application. The Complet defines the most minimal

unit of relocation. At all times, each Complet is associated with exactly one Core.

According to the reference rules imposed by FarGo, objects can reference either to

their containing Complets or to the anchor of other Complets.

When designing the layout of DEM application using FarGo’s terminology we result

in the following division into Complets:

Mailbox The mailbox should be able to move from Core to Core

in order to provide the most basic feature of the DEM

system – keeping the mailboxes and their contained

information alive.

MailboxGUI To enable mailbox managing from a distance, we chose

to define this component as a Complet as well.

Mailbox Pool Although this component stays on a single Core from

the moment it is created it had to be defined as a

Complet. As previously explained, this is due to the fact

other Complets needed the ability to reference the

mailbox pool (both mailboxes and dispatch units).

50

DispatchUnit The dispatch unit does not tend to move through the

system a lot, although, according to the design it should

be able to improve it’s location based on dynamic

location optimization opportunities. For this reason, and

due to the fact other Complets need to be able to point

to this component, we chose to define the Dispatch Unit

component as a Complet.

DispatchUnitGUI To enable distant monitoring and possible management

of the dispatch units, we choose to define the dispatch

unit’s GUI component as a Complet as well.

51

2.5 Inter-component communication

In this section we will review the inter-complet communication that passes

through the DEM system using FarGo’s middleware. Basically, all communications

between Complets does not require special action but rather they take place with

every distant method invocation, transparently.

Although we can mark the communication between the dispatch unit and its GUI (and

the mailbox and its GUI for that matter) as inter-complet communication, we choose

to focus on the application’s most important communications. These include the

process of sending a mail, the process of casting a mailbox and the synchronization of

the dispatch units.

2.5.1 Sending a mail

When a mailbox wishes to send a mail to some mailbox it should know the

address that represents the distant mailbox.

As previously defined, at all times, a mailbox is contained in a single mailbox pool. A

mailbox pool offers the service of finding a living dispatch unit. Using the available

reference to the containing mailbox pool, the user that intends to send the mail is able

to retrieve a reference to a living dispatch unit.

Next, a reference to the destination mailbox is retrieved from the dispatch unit using

the known destination mailbox address. The distant mailbox address string is used as

a search key (in the dispatch unit’s user/mailbox map).

At that point, the sending party holds a reference to the distant mailbox. Using the

addMail method, the new mail is added to the distant mailbox, thus completing the

mail sending process.

52

Figure 5: the process of sending a mail

2.5.2 Casting mailboxes

Casting a mailbox to another Core is a basic feature that must be implemented

in the DEM system.

The casting of mailbox from its containing Core/mailbox pool can be caused by a

load balancing mechanism or due to the fact the currently holding core performs a

shutdown.

In the case of a core shutdown, the contained mailbox pool needs to be emptied. In

order to perform that operation, the mailbox pool communicates with a dispatch unit,

which in turn performs the actual casts. When the dispatch unit performs the casting

operation it takes into account the fact distant pools might be currently disconnected

(it performs aliveness testing of distant pools). The dispatch unit also makes sure that

the new home of the mailbox is not the current mailbox pool.

The actual mailbox casting is performed by a dispatch unit to allow it to track the

mailbox’s location. This information is later propagated to the other dispatch units

(c) Get mailbox

(d) Add mail

(b) Is alive

(a) Get dispatch unitMailbox

GUIMailbox

Pool

Dispatch Unit

Mailbox(Recipient)

53

Figure 6: Casting a mailbox

Another possible origin for the movement of a mailbox is the connection of a mailbox

viewing component into the DEM network. In that case, during the login process, a

new local mailbox pool is created. The mailbox, retrieved from one of the available

dispatch units, is then moved to the newly created local mailbox pool.

Figure 7: Mailbox casting on login

(d) Cast

(b) Cast a mailboxMailbox

PoolDispatch

Unit

(a) Shutdown

event

(c) Is alive

Distant Mailbox

Pool

(c) Cast a mailboxMailbox

PoolDispatch

Unit

(a) Construct

(d) Cast

(b) Get mailbox Login

54

2.5.3 Synchronizing dispatch units

As explained in the components description section, a special purpose

synchronization thread is attached to each dispatch unit.

The synchronization thread wakes at constant time intervals communicating with all

known dispatch units. During the communication process, data structures in both the

local and distant dispatch units are updated according to their common knowledge.

The synchronized information includes the list of available dispatch units, list of

known mailbox pools and list of known users and mailboxes.

Figure 8: Dispatch unit synchronization process

(c) get DU list (h) Sync user m

ap

(e) get mailbox pools

(b) add dispatch unit

(a) isAlive

Dispatch Unit

Distant Dispatch Unit (d) Sync D

U

list

Sync. Thread

(f) Sync m

ailbox pools(g) get user map

55

3 Testing Environment

56

3.1 Overview

Testing a compound system such as the DEM system is a hard task. To

achieve an acceptable degree of software quality assurance, an extensive system test

must take place.

The tests may range from merely simple functionality testing of the different application

components, to inter-component interaction tests.

Functionality test includes the invocation of public available services of each component,

assuring the correct object state is retained after the service was invoked.

Testing the overall application’s correct state and behavior is a much harder task. This is

due to the fact the scenarios include inter-component relations. Although the set of

possible inter-object message is small, the compound and concurrent activity is hard to

fallow and debug.

We focus our testing on two basic approaches:

System stress tests that should assure correct behavior under large amounts of

communication between the system’s components.

Test scenarios in which faults occur on different system components, either local

one-time faults or multiple concurrent faults.

In order to provide a flexible environment for the development and execution of such

tests, we suggested the integrated testing environment.

In the design of the integrated environment we wished to have the following:

Core and Complet browser to enable the monitoring of all available

Complets on all of the registered Cores.

57

An interleaved view of the output of all components (including those

on a remote core)

A scripting interface that can be used to start the system, examine its

state and modify it (by invoking publicly available services) at

runtime

In order to provide the ability to monitor distant cores a special spy Complet was

implemented. Using the spy one is able to activate new components on a remote core

and also monitor the core’s activity, sending all information back to the central testing

environment.

Distant threads are used to collect all of the log outputs of the different system

components, directing all the information back to a central location – the integrated

testing environment. Special time stamps that are added to each log message can be

used to manually determine local ordering errors that might be caused by network or

operating system delays.

To enable a flexible convenient scripting interface, we had to choose a powerful

scripting language. Python [26], being a flexible interpreter that combines remarkable

power with a very clear syntax, was the best option that we could find.

In this project we chose to use the Jython interpreter. As explained in previous

sections, Jython is a Python interpreter written in Java. Using the integrated console, a

user can write python scripts that interact with the actual currently running objects.

The testing environment’s GUI includes all the described features, enabling a quick

review of the total system state, log area (for log based debugging and) and a scripting

console integrated in the environment for a convenient testing (snapshoot available at

appendix A: user manual).

As a special feature, this integrated testing environment may be used by DEM system

administrator. As an administrator running such a tool, one is able to influence

mailbox distribution and re-distribution. It is easy to setup new Cores and start a

distant dispatch unit running.

58

Future projects that wish to use the FarGo middleware will need some testing

environment. It is possible to make this testing environment generic enough to be

used by any other FarGo based application. It is also possible to simply adopt this

version to a new project.

59

3.2 Component description

In this section we will make a short review of the components the make out

the integrated testing environment.

3.2.1 Tester

The tester is the main object in the testing environment. Containing the testing

environment’s GUI, this object manages the Core/Complet browser. It also manages

the Jython console and the log console.

To enable the centralization and interleaving of the logs the tester offers logging

services.

In order to save development time and not implement already available components,

an external Jython console component was used – the SPyConsole [28].

All of the above features suggest the following methods

startConsole Activate the Jython console, intercepting all user

keystrokes to the script editor. After invocation of this

method, the application will not run any sequential code

(that appears after this method’s invocation) .

addLog Add a blue colored log to the environment logging area.

This blue color indicates that the message is a regular

message.

addLogError Add a red colored message to the environment logging

area. The red color indicate that the message is that of

an error.

60

addCore Add a new core to the list of active, monitored, cores.

This method receives not only the Core URL of the

newly created core, but also a reference to the distant

spy that resides on that core.

spyShutdown A method used by the spy to signal the tester that it is

about to be shutdown. The tester reacts with the

removal of the spy from its list of active spys.

modelChanged Used by other components (such as the spy) to signal

the testing environment that one of its monitored

components was somehow modified. Modification

includes the possibility a component moved from one

core to another.

3.2.2 Spy

At the core of the monitoring ability lays the spy Complet. On the creation of a

distant Core, a spy settles and registers itself to all possible Core events. These

include the event of Complet construction, destruction, arrival and departure. The

spy also listens to the Core shutdown event.

In order to provide only the Complet information that is relevant to the DEM

system, the spy is capable of filtering out non-DEM component activities. This

leaves a cleaner working environment as it comes to using the testing tool

effectively.

As part of being able to monitor the activity on the containing core, the spy offers

numerous component creation services. Using the spies, a user of the integrated

testing environment is able to create distant components such as a dispatch unit,

mailbox pool, online and offline users.

61

Summarizing all of these features into desired services suggests the following list

of methods:

registerOnCore register the spy on the distant core. This enables the spy

to monitor on core Complet activities and transmit the

acquired data to the central testing tool. The list of

events gathered was previously mentioned.

setTester setup the reference to the central tester object, which

runs the integrated testing environment

completConstructed intercept a complet construction event. Filter any non-

DEM complet creation.

completFreed intercept a complet destruction event. Filter any non-

DEM complet destruction.

completsDeparture intercept the departure of a set of Complets. This

method will filter out any non-DEM complet

movements.

completsArrived intercept the arrival of a set of Complets. This method

will filter out any non-DEM complet movements.

shutdown This method is used to make the containing core

perform a clean shutdown.

crash simulate a distant core crash event.

createDispatchUnitPack Creates a dispatch unit with a monitoring

dispatch unit GUI. A mailbox pool also comes

as part of this set of constructed components.

62

createNewOnlineUserPack This method is used to create a new online user

on the local core. The new user pack includes a

new mailbox, a monitoring mailbox GUI and a

mailbox pool.

createNewOfflineUserPack This method is used in order to create a new user

that is not currently viewed by a GUI. This

means that only a new mailbox is created. This

mailbox is automatically cast to some on-line

mailbox pool.

loginUser Tries to retrieve the mailbox of a user. A GUI is

then created to monitor the retrieved mailbox. A

new mailbox pool is also created such that it will

contain that retrieved mailbox.

getCompletNum retrieve the number of Complets which this spy

manages. Notice that the list contains only DEM related

components (such as mailboxes, dispatch units, etc.)

getComplet Retrieve a specific complet from the managed Complets

list. The complet is retrieved according to a given index.

getCoreURL retrieve the core URL which stores this spy

toString Returns a constant “Spy” string clearly stating the

functionality of this object.

equals this method is used in the comparison of this spy with

another given object. The unique complet ID is used in

the comparison.

fireChange Notify the central tester that the spy intercepted an

event that is related to one of DEM’s complets.

63

Figure 9: Tester class diagram

64

4 Future directions

One major issue that is not dealt with in the current DEM design is security.

There are a variety of security problems starting from forging a sender or mail time

stamp to reading someone’s personal mail. To make such an application usable

outside the lab, it is important to invest research time to resolve this problem. There

are many known techniques for the protection of the content against unauthorized

reading and/or modification of mail. Algorithms dealing with electronic signatures

[19, 20] are available and only small adjustments are needed in order to provide this

important feature.

To allow even greater flexibility of the DEM system, we can examine the possibility

of defining the mails themselves as complets. When a user becomes online and will

retrieve the mailbox, the mail and its content won’t automatically follow but rather

prefer to stay on the distant computer. This will reduce unneeded communication the

occurred when all the mails contained in a mailbox were transferred to the reader’s

computer.

Currently, the mailbox pools do not offer a dynamic load balancing. Balancing the

load on each DEM client, taking individual computer capabilities into account, would

improve the overall system performance.

To enable the inter-mail transfer between the DEM system and currently available

DNS based mailing services we suggested the construction of mailing bridges. From

one end these bridges will act as a normal mail server, receiving mails from the

“outside” world and pushing them into the DEM system. Mail from DEM directed to

normal servers will be routed through these bridges and on to their destination.

Although a decentralized design was one of the main motivations in this project, we

did not fully achieve this objective. As can be seen in the design, a central piece still

exists in the form of a dispatch unit. Although one might think such a central

component (or a variant of such a component) must exist in all electronic mailing

65

systems, we believe that by using a peer-to-peer protocol such as Pastry [3], a fully

decentralized system can be designed and implemented.

A problem that is apparent in the current design is that in order to send a mail to a

long mailing list, the sender must pay a processing time that is almost linear to the

number of recipients. To reduce the complexity of such an operation, it is possible to

enable users to define trusted users. These trusted users will aid with the distribution

of a mail to a large number of recipients. Assuming all the mail-recipients trusts the

sender, a logarithmic division of work can be achieved, reducing the complexity to

that of a O(log(n)).

Regarding the integrated testing environment that was developed during the project,

one can try to write a more generic version of that environment. The integrated

environment proved to be very useful for fast and easy creation of complex test

scenarios. It also proved to be useful as a monitoring and managing device for the

FarGo based application.

The testing environment should also be extended to allow it to attach spies to already

running distant cores. Such a feature will contribute to the effectiveness of the tool as

a debugger.

66

Appendix A: User manual

In this section we will review the different components available in the DEM system

from the user’s point of view. We will show how to start a dispatch unit, how to

create a new user in the system and how to login as a user of the system.

For the more advance user, we will demonstrate how to start the logging services and

how to work with integrated testing environment

67

Start logging services

In order for any of the DEM application components to work correctly, the

logging services must be started. This should occur prior to the creation of any other

DEM application component.

To restart the entire system including the log service, a user can use the restartLog

shell script. This script kills all currently running java application (including the

logging services). It then restarts the logging services.

Upon a successful execution of the logging services, a user is now able to start the

other DEM components. Notice that this application should be kept running as long as

the DEM system operates.

68

Dispatch Unit

In order to get a DEM network up and running at least one back-bone dispatch

unit should exist. A back-bone dispatch unit is one of the dispatch units available in

the DispatchUnit.list file. Using this list we are able to solve the first connection

problem, as discussed in previous sections.

To startup a dispatch unit, one can use the runServer shell script. This script receives

the name of the new core that will hold the dispatch unit. For example, the command

“runServer station1” will start a core name station1 on the local computer. On top of

that new core, a dispatch unit will be created.

A mailbox pool is automatically created along with the dispatch unit. This is done in

order to provide a preliminary location for the newly created mailboxes.

A dispatch unit monitoring GUI is also created. Using this GUI component, a user is

able to view the DEM network status. In the first top panel, green man icons indicate

online users (users that currently read and interact with their personal mailbox). In the

same panel, red man icons indicate that the users of that mailbox are currently off-

line. The address of each user is visible as a string next to the man icon.

In the middle panel one can view the currently active mailbox pools. The first column

shows an icon of a swimming pool indicating the existence of the mailbox pool. On

its right we will get either a blank icon or a mailbox icon. A blank icon indicates no

mailboxes currently reside on that mailbox pool. In case one or more mailboxes reside

on that mailbox pool, we will see the mailbox icon. The number between the braces

and next to that icon indicates the exact number of mailboxes that are currently

available inside that mailbox pool. A unique mailbox pool ID is available next to

these icons.

69

At the lowest panel we will get a list of currently connected dispatch units. Each entry

in that list displays a dispatch unit icon and a unique Complet ID representing that

dispatch unit.

Figure 10: Dispatch unit monitoring GUI

Closing the dispatch unit will cause a local core shutdown, after which the detection

of the shutdown event will propagate automatically in the graph of currently

connected dispatch units. Notice that at least a single backbone dispatch unit should

exist at all times, allowing users to login to the services and other dispatch units to

connect to the graph of dispatch units.

70

Mailbox client

A client that wishes to connect to the DEM network, retrieve the personal

mailbox and start working should perform a login procedure.

Although the login procedure does not include any authentication, a user that wishes

to log onto the system needs to supply a user name (the logical address of the

mailbox).

Upon successful login, a user can view and manipulate current mail through the main

mail GUI. When a user wishes to compose a new mail, a special purpose compose

mail GUI is created.

In the following sub-sections we will describe these components.

Login

To start a login procedure, a user can use the runLogin shell script. No

parameters are needed to be past to this script.

The login procedure starts with a login-address text field pop-up. This allows the user

to enter the address of the personal mailbox.

Figure 7: Login screen

Using the list of backbone dispatch units, we can resolve the first connection problem,

allowing the negotiation with distant dispatch units.

71

In case the desired mailbox does not exist, the user is prompted regarding the creation

of a new mailbox.

Figure 11: Create a new mail user

In case the user approves this action, a new mailbox is created on a distant core. Next,

a mailbox pool and mailbox GUI are created on the local core. The login process ends

when the distant mailbox comes to reside on the local core.

In case the desired mailbox does exist, we will get a new local mailbox pool and a

also a new mailbox GUI component viewing the retrieved mailbox.

Mail review and manipulation

Through this main mail client GUI, a user is able to view, create, forward,

reply and delete a mail. All of these actions are available as buttons in the upper most

part of the mail manipulation GUI.

Viewing a mail includes fields such as the sender, the subject and the sending date. All

of this information appears in a row in a table visible on the upper part of the GUI.

In the lower GUI section we will get a view of a selected mail content. Using scrolling

one can view the entire mail content.

72

Figure 12: mailbox view and control

The first left-most icon in the upper buttons area enables the user to create a new mail.

A discussion of the mail creation GUI is available in the following sub-section.

The GUI used for composing a new mail is also used when replying to old mail.

Composing a mail

The most intuitive GUI component in the system is the one responsible for the

creation of a new mail.

This GUI component is divided into four sections. Using the tab key a user can switch

between the available fields. The upper text field is that of the sender. This field is

automatically filled. Next is the receiver text field, after which the field of the subject

appears.

73

When possible, the compose mail component will try to automatically fill up the

available fields. For example, when a user replies to a mail, the sender and recipient

of the mail are simply switched.

The largest text area in this component is that of the content field. A scrollable text

area offers a convenient content editing area. At the bottom of that area lays the send

button which is used to complete the mail composition procedure.

Figure 13: Compose a new mail

74

Testing environment

As described in previous sections, an integrated testing environment is

available allowing advanced users to test the system is a comfortable way. This tool

may also be used a management tool for the DEM application.

To run the test environment, a user can invoke the runTest shell script. Notice that

before testing environment can be executed, the logging services must be available.

Figure 14: Integrated testing environment

Upon execution, a clear work space will appear. The browser tree will appear empty

on the left side of the frame. The larges middle piece is the scripting interface. At the

75

bottom of that frame one is able to view the interleaved color coded logs. Intuitive

scrolling is available for each one of these components

At the top of the frame there is a short menu used for scripting interface oriented

actions. The user is able to load scripts, edit and manipulate current scripts. Credit for

the creation of the Jython console component is available as part of that menu, under

“help”.

Regarding the browser tree, icons used in previously explained components are

available to visualize information regarding the complet type and location per

connected core. The intuitive tree-style browser allows the user to quickly explore

each of the currently connected cores.

76

Appendix B: Application Requirements

In order to run any of the DEM application components, the following list of

JAR files must be available:

DEM.jar Contains all the components that construct the

DEM network system. All code written during

the project is concentrated in this file, divided

into three packages that reside in this file:

Client, Server and Testing.

Fargo_wyaron.jar Contains all the infrastructure needed in order to

run the FarGo pre-compiler and use any of

FarGo’s facilities

Jython.jar Contains the java implementation of the python

scripting language. Must be available due to the

fact it is used in the testing environment.

JythonConsole In order to provide with a convenient console

that is integrated in the testing environment, this

JAR offers the needed GUI component including

its creation and handling.

log4-j-1.2.7.jar The centralized logging facility that is used

throughout the DEM application is available

based on the use of the log4j infrastructure. This

JAR provides all the needed classes for this

feature.

xmlParserAPIs.jar Although not directly needed by the DEM

application, this JAR is used by the ANT make

77

tool. We felt that leaving this library in the final

package will ease future development.

xercesImpl.jar As with the xmlParserAPIs.jar, this JAR is used

in the parsing of XML files, as it is done by the

ANT tool. We leave this JAR near for the same

reasons.

All of these mentioned JAR files should reside in a directory named lib under the

DEM home directory.

To allow further development we left other ant library files in the final package.

Among those are ant.jar, optional.jar, xercesImpl.jar and xml-apis.jar which should

reside in a directory named ant_lib which should reside inside the previously defined

lib directory.

As can be viewed in the screenshots, the DEM application provides an intuitive

interface which relays heavily on informative icons. All of the needed graphic images

reside in a directory names images inside the DEM home directory.

The final distribution of the DEM application is available in a file names DEM.jar.

This file must be available in order for any of the components to run. This JAR should

reside under the dist directory that should be found under the DEM home directory.

Examples for python test scripts should be available under the Tests directory located

in the DEM home directory.

All documentation, including the DEM API and this report document should be

available under the Documentation directory which resides under the DEM home

directory.

78

References

[1] Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience", Concurrency and Computation: Practice and Experience, 2004.

[2] Darleen Sadoski, “Client/Server Software Architectures – an Overview”, STR, 1997.

[3] Rowstron and P. Druschel. “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”. In Proc. IFIP/ACM Middleware 2001, Heidelberg, Germany, Nov. 2001.

[4] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan. “Chord: A Scalable Peer-to-peer Lookup Service for internet Applications”. In Proc. ACM SIGCOMM`01, San Diego, CA, Aug. 2001.

[5] S. Ratnasamy, P. Francis, M. Handley, R. Karp and S. Shenker. A scalable content-addressable network. In Proc ACM SIGCOMM`01, San Diego, CA, Aug 2001.

[6] Napster, http://www.napster.com/.[7] The Gnutella protocol specification 2000.

http://dss.clip2.com/GnutellaProtocol04.pdf.[8] Cockayne, W. R., and Zyda, M., Eds. 1998. Mobile Agents. Prentice Hall[9] T. Walsh, P. Nixon and S. Dobson. Review of the mobility

systems. Technical Report TCD-CS-2000-13, University of Dublin Trinity College, March 2000

[10] Ophir Holder, Israel Ben-Shaul and Hovav Gazit. “Dynamic Layout of Distributed Applications in FarGo”. Proceedings of the 21st International Conference on Software Engineering (ICSE'99), Los Angeles, CA, USA, May 1999.

[11] Ophit Holder, Hovav Gazit. “FarGo Programming Guide”. Technical report EE Pub 1194, Technion – Israel Institute of Technology.

[12] J. Hugunin, B. Warsaw, et al. The Jython/JPython Web Site. http://www.jython.org/

[13] James Gosling, Henry McGilton, “The Java Language Environment”. White paper, May 1996.

[14] Amy Fowler, “A Swing Architecture Overview”, http://java.sun.com/[15] Log4j, http://logging.apache.org/log4j[16] Douglas B. Terry. Managing updage conflicts in Bayou, a weakly connected

replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, December 1995.

[17] J. Demers, K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and B. B. Welch. Proceedings of the Workshop on Mobile Computing Systems and Applications, Santa Cruz, California, December 1994, pages 2-7.

[18] John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. “OceanStore: An Architecture for Global-Scale Persistent Storage“. Appears in Proceedings of the Ninth international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), November 2000.

79

http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf

http://oceanstore.cs.berkeley.edu/publications/papers/pdf/asplos00.pdf

http://logging.apache.org/log4j

http://java.sun.com/

http://www.jython.org/

http://dss.clip2.com/GnutellaProtocol04.pdf

http://www.napster.com/

[19] R.L. Rivest, A. Shamir and L. Adleman. “A Method for Obtaining Digital Signatures and Public-Key Cryptosystem” Comm. ACM 21(2): 120-126 (1978).

[20] W. Diffie and M. Hellman. New Directions in Cryptography. IEEE Trans. Info. Theory 22(6): 644-654 (1976).

[21] Richard Golding and Elizabth Borowsky, “Fault-tolerant replication management in large-scale distributed storage systems”, In Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Oct 1999.

[22] Rachid Guerraoui and Andre Schiper, “Fault-Tolerance by Replication in Distributed Systems”, in Proc. Reliable Software Technologies Ada-Europe '96

[23] Eclipse, http://www.eclipse.org/[24] Ant, http://ant.apache.org/[25] JavaDoc, http://java.sun.com/j2se/javadoc/[26] Python, http://www.python.org/[27] Ian Clarke, “Freenet: A Distributed Anonymous Information Storage and

Retriveal System”. Division of Informatics, University of Edinburgh 1999[28] ftp://iee.umces.edu/SME3/JConsole/

80

ftp://iee.umces.edu/SME3/JConsole/

http://www.python.org/

http://java.sun.com/j2se/javadoc/

http://ant.apache.org/

http://www.eclipse.org/

part a - technion – israel institute of...

Documents