la sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/master thesis - leonardo aniello.pdf ·...

69
University of Rome “La Sapienza” Engineering Faculty Master Thesis in Computer Engineering Autumn Session - October 2010 A contract-based event-driven model for cooperative environments: the case of collaborative security Leonardo Aniello Supervisor: Prof. Roberto Baldoni Examiner: Prof. Luca Becchetti Assistant Supervisor: Dott.ssa Giorgia Lodi

Upload: others

Post on 12-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

University of Rome“La Sapienza”

Engineering Faculty

Master Thesis in Computer EngineeringAutumn Session - October 2010

A contract-based event-driven model for cooperative environments:

the case of collaborative security

Leonardo Aniello

Supervisor: Prof. Roberto Baldoni Examiner: Prof. Luca Becchetti

Assistant Supervisor: Dott.ssa Giorgia Lodi

Page 2: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

ii

Page 3: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

iii

to my family and my long-time and more recent friends

Page 4: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

iv

Acknowledgements

I believe that one’s results heavily depend on the context he lives daily. Looking back to my life

and to the goals I’ve accomplished, I consider myself very lucky to have been surrounded by persons

that have always given me the required serenity and the right boost to go ahead with my aspirations

despite the usual big and small troubles that one has often to face.

At this regard, I’d like to thank with all my heart my mother and my father for having supported

me in all these years, in spite of the several changes I’ve made about my future directions. And

I can’t forget to thank all my friends in Terni and surroundings, from the high school ones to the

others that I’ve met over time, for their loud company and strong understanding.

A special thank goes to Roberto Baldoni and Giorgia Lodi for having introduced and guided me

through the nice experience of getting involved in a European project and for the help they’ve given

me in the writing of my thesis. I’d like also to thank Stefano, Luca and Silvia for their contribution

to my work.

Page 5: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Contents

Introduction 1

1 Scenario and Requirements 5

1.1 Reference Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Distributed Denial of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.2 Man in the Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1.3 Identity Theft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 General CoMiFin Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.2 SR Management Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.3 Privacy and Security Requirements . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.4 Event Processing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2.5 Performance Monitoring Requirements . . . . . . . . . . . . . . . . . . . . . . 16

2 CoMiFin Architecture 17

2.1 The CoMiFin Service Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 CoMiFin Principals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.2 Semantic Rooms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 CoMiFin Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 SR Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.2 Commodity Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.3 SR Complex Event Processing and Applications . . . . . . . . . . . . . . . . 24

3 SR Life Cycle Management 27

3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 SR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.2 Segregation of Duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 High-Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 Partner and Account Management . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Software Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.3 SR Schema Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

v

Page 6: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

vi CONTENTS

3.2.4 SR Life Cycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.5 SLA Violation Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.6 Component Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Low-Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.2 HowTo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 An SR for Intrusion Detection 49

4.1 Collaborative Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 The Agilis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 WebSphere eXtreme Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 Hadoop and MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.3 Processing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.4 Long-Term Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.5 The Jaql Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 The ID-SR Processing Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Main Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Conclusions 57

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2.1 Other Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2.2 Botnet Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2.3 Resource Allocation and Scheduling Optimization . . . . . . . . . . . . . . . 59

5.2.4 Privacy and Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Bibliography 61

List of Figures 63

Page 7: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Introduction

The raising availability of Internet connectivity and the increase of network bandwidth are boosting

the use of the world wide web for providing more and more services. Supported services range from

e-commerce to social networks, from telephone calls to business-to-business transactions. Therefore

a huge amount of businesses are relying on the Internet. Moreover, the access to an unbelievable

quantity of high-value and sensitive information is enabled and secured employing the Internet as

Critical Infrastructure. What could be the next generation services provided by Internet? Besides,

is Internet reliable enough for supporting all these services?

This work aims at addressing together these two questions, investigating the paradigm of coopera-

tive environments as a possible mean of letting diverse organizations collaborate in the context of

the security of Financial Institutions. A brief overview of cooperative environments is provided, in

order to introduce the main topics of the whole work. All these aspects are supported by CoMiFin,

a European project focused exactly on the security of Financial Institutions. A short description

of CoMiFin is also given, together with a more detailed background about Financial Institutions

scenario.

Cooperative Environments

For today’s software systems, the need of timely and adaptively reacting to unpredictable changes

in the environment, so as to identify and notify possible opportunities and threats to interested

actors, is becoming more and more crucial. On these lines, a very interesting class of systems is

that of Sense and Respond (SRS). They detect and correlate external events, that is the sense

phase, and then produce on time useful outputs, that is the respond phase.

A primary property of SRSs is the ability to produce timely responses. Part of the complexity of

present environments is due to the high speed in which changes occur and it’s often crucial to react

with a limited delay. For example, systems that monitor Critical Infrastructures have to notify

anomalies as soon as possible, in order to prevent damages to people or things.

An important aspect of SRSs is that the quality of generated responses depends on how much input

is gathered. Let’s consider a system in charge of computing the fastest route to reach a certain

destination. If this system were based on roads’ topology only, such calculation wouldn’t take into

account any problem due to traffic condition or road accidents. The employment of some automatic

1

Page 8: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

2 CHAPTER 0

mean to detect the current situation of roads would surely improve the quality of computed route.

We can say ”the more I could sense, the better I would respond”. So, another relevant property of

SRSs is the ability of gathering input data from several sources, possibly heterogeneous and widely

distributed.

This latter property can be seen as a way of putting together many different actors with the aim of

obtaining some added-value benefit. Resuming route calculation example, roads’ topology, traffic

conditions and real-time information about road accidents are correlated to get the fastest way.

Another interesting example is about failure detection. In distributed systems, load peaks and

hardware/software failures can heavily affect the communication between nodes. The capacity of

dynamically detecting failed nodes is fundamental for carrying out main distributed communica-

tion protocols, but the delays introduced by the network and the topology of the network itself can

make hard for each node to have the same knowledge about the health of the other nodes. The

idea is to let all the nodes collaborate sharing their own knowledge in order to achieve a common

perception about which nodes are alive and which not.

In this regard, a very interesting area of research concerns the usage of SRSs to let a set of in-

dependent systems collaborate for an established common goal, creating in this way a cooperative

environment. A brief analysis of the properties satisfied by this class of systems is quite helpful to

identify what is needed to obtain collaboration in practice. First of all, due to the large volume

of input data and to the demand for elaborating them in a timely fashion, we need to use some

parallel computing paradigm, so that computation could be speeded up as required to meet time

constraints. This in turn implies the employment of a huge number of computational, storage and

network resources, so that a parallel elaboration could be carried out on a big input data set.

An emerging technology suitable for supporting cooperative environments is cloud computing [26].

Besides its innovative business aspects, it concerns also with the over-the-Internet provision of

dynamically scalable and often virtualized resources. Its scalability is useful to face any peak in

the load or hardware/software failure. The virtualization is important for maximizing resource

utilization. So, cloud computing can be the right mean for having at disposal the required amount

of resources where the desired parallel computing framework can be then deployed.

CoMiFin

CoMiFin (Communication Middleware for Monitoring Financial Critical Infrastructure) [3] is an

EU project funded by the Seventh Framework Programme (FP7), started in September 2008 and

continuing for 30 months. The research area is Critical Infrastructure Protection (CIP), focussing

on the Critical Financial Infrastructure (CFI).

An increasing amount of sensitive traffic is being carried over open communication media, such as

the Internet. This trend exposes services and the supporting infrastructure to massive, coordinated

attacks and frauds that are not being effectively countered by any single organization. In order to

Page 9: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3

identify threats against Critical Infrastructures and business continuity, CoMiFin aims to facilitate

information exchange and distributed event processing among a subset of participants grouped in

federations. Federations are regulated by contracts and they are enabled through the Semantic

Room abstraction: this abstraction facilitates the secure sharing and processing of information by

providing a trusted environment for the participants to contribute and analyze data. Input data

can be real time security events, historical attack data, logs, and other sources of information that

concern other Semantic Room participants. Semantic Rooms can be deployed on top of an IP net-

work allowing adaptable configurations from peer-to-peer to cloud-centric configurations, according

to the needs and the requirements of the Semantic Room participants.

A key objective of CoMiFin is to prove the advantages of having a cooperative approach in the

rapid detection of threats. Specifically, CoMiFin demonstrates the effectiveness of its approach

by addressing the problem of protecting financial Critical Infrastructure. This allows groups of

financial actors to take advantage of the Semantic Room abstraction for exchanging and processing

information, thereby allowing them to take proactive steps in protecting their business continuity,

for example, through generating fast and accurate intruder blacklists.

Scenario: Financial Institutions

Nowadays, the financial industry is witnessing technological and usage changes due to globalization,

new trends towards the ”webification” of financial services such as home banking, online trading,

remote payments, and increased competitions among financial stakeholders.

In such a context, it emerges that financial institutions’ infrastructures are no longer being confined

within single organizational boundaries; they start becoming part of a global unmanaged financial

ecosystem that consists of interconnected financial domains and other critical infrastructures like

telecommunication supply, electricity supply in which cross-domain interactions spanning different

administrative borders are in place.

As of today, the overall number of transactions being conducted over the above mentioned financial

ecosystem is increasing. Specifically, it is increasing the portion of traffic that is carried out through

public networks such as Internet, thus exposing the overall ecosystem to massive and coordinated

attacks and frauds.

Protecting the ecosystem from faults and malicious attacks is then essential in order to ensure

stability, availability, and continuity of key financial markets and individual businesses, since those

attacks might also significantly compromise the results of financial transactions.

Currently, some threats or attacks targeted to different financial entities are difficult or even im-

possible to be detected adopting local single-domain monitoring approaches. However, a novel

approach based on inter-organization cooperation and information sharing can improve security

threats detection capabilities.

The main objective of CoMiFin is to design and develop a distributed system that can enhance the

Page 10: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

4 CHAPTER 0

situation-awareness of financial organizations so as to allow them to better address security threats

and timely trigger local protection mechanisms, thus preventing or mitigating dangerous effects.

In order to build the system, it has been considered a possible scenario within which proving the

effectiveness of the CoMiFin solution.

My Contribution

Within the context of CoMiFin project introduced so far, my contribution has been twofold:

� I’ve been actively involved in the analysis, design, development and installation of SR Man-

ager, a component of CoMiFin system architecture that will be deeply described later;

� I’ve developed a simple monitoring software aimed at collecting statistics about the perfor-

mances of Agilis, a distributed parallel computing framework that will be detailed afterwards.

Organization

The thesis is organized as follow:

- in Chapter 1, the scenario of Financial Institutions is investigated in more details in order to

catch the requirements the system is expected to meet;

- in Chapter 2, the architecture of CoMiFin system is illustrated by a top-down approach;

- in Chapter 3, requirements for SR Management components are identified and refined from

high-level requirements, and the architecture of these components is described in more details;

- in Chapter 4, an insight of an SR for Intrusion Detection is provided, together with a possible

architecture supporting it;

- finally, Chapter 5 outlines the conclusions obtained from this work and proposes some future

directions for this area.

Page 11: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Chapter 1

Scenario and Requirements

As already stated in the introduction, the collaborative approach can offer benefits in several

scenarios:

� in the financial context, organizations can cooperate sharing their traffic data to improve

their security defenses;

� the calculation of shortest routes can be enhanced putting together info coming from sources

of different nature, that is making them collaborate;

� reliable failure detection services can be built on top of collaborative environments that make

involved nodes exchange their knowledge about the health of other nodes.

This chapter, as well as the rest of thesis, is aimed at exploring in more details the current situation

of financial environments. Most important threats are described and analyzed in order to get the

basis for identifying the requirements that have been relevant for what concerns my contribution

and that CoMiFin project is expected to address.

1.1 Reference Scenarios

The key elements of the overall CoMiFin financial scenario have been investigated starting from

the structure of a single organization (or business entity) that may be willing to use the CoMiFin

system and participate in the so-called CoMiFin cloud. Figure 1.1 depicts this structure. An orga-

nization can be thought of as logically divided into two principal parts: an internal part (the left

hand side of the shaded rectangle in Figure 1.1) and an external part (the right hand side of the

rectangle in the same figure). The internal part consists of the set of hardware and/or software

components that communicate with each other using intra-domain communications; intra-domain

communications are usually carried on over possibly proprietary and highly secure networks, using

proprietary protocols. The internal part can be in turn constructed out of a set of internal networks

(the dotted circles in Figure 1.1 ) that interact one another inside a major organization. These

networks might define the boundaries of internal organizations that compose the major one.

5

Page 12: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

6 Chapter 1: Scenario and Requirements

Figure 1.1: Structure of business entity

Hardware and software resources deployed within the internal part are not accessible from the

outside, since they are not directly connected to the Internet and are properly protected by hard-

ware/software components in order to guarantee a high level of isolation. For instance, Figure

1.1 illustrates a bank corporation (i.e., Lloyds TSB) that may consist of a set of banks each of

which can be located in different geographical areas by means of bank agencies. Each bank in

the major corporation, and recursively each bank agency, can use an own proprietary network in

order to carry out financial activities. The external part consists of the set of hardware and/or

software components connected to the outside world (i.e., the Internet). In the bank corporation

example of Figure 1.1, dedicated resources can be used for instance in order to provide end users

with e-banking services that use communications over Internet. Communication and interaction

between the external and internal parts are regulated by specific security policies and several levels

of firewalls (which may include the definition of a DMZ for hosting external resources) in order to

protect the internal part from attacks and unauthorized accesses.

Based on this organization structure, Figure 1.2 shows how a single organization can contribute to

the construction of a global financial ecosystem, and the position in that scenario of the CoMiFin

cloud.

In the CoMiFin scenario, there exist actors strictly related to the financial context, and critical ser-

vice providers. Figure 1.2 depicts some financial actors such as banks (e.g., Unicredit, Lloyds TSB)

and clearing houses (e.g., SWIFT), although any other organization related to the finance envi-

ronment (e.g., insurances, regulatory agencies, government agencies), and critical service providers

such as telecommunication companies (e.g., AT&T) and power grid companies can be also involved.

Each actor that is willing to exploit the functionalities offered by the CoMiFin system can partici-

pate in the CoMiFin cloud with a number of so-called end-points (the pattern filled circles in Figure

1.2). These end-points include dedicated software components that, along with the resources pro-

vided by each actor, are connected to the Internet in order to exploit Internet’s robustness. Events

Page 13: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

1.1. REFERENCE SCENARIOS 7

Figure 1.2: CoMiFin in a Financial Scenario

will flow from the various end-points into the CoMiFin cloud: they will be processed by the CoMiFin

monitoring system in order to obtain semantically enriched data (or processed data) for threat de-

tection. The processed data will then be disseminated among the interested actors participating in

the CoMiFin cloud.

In the considered scenario, regular transaction activities that Financial Institutions (FIs) per-

form are isolated from the activities carried out by the CoMiFin system. In particular, financial

transactions are usually handled through dedicated networks and protocols (e.g, SWIFT) for inter-

Financial Institutions activities or through proprietary secure internal or external networks that

FIs maintain (e.g., between bank branches).

In contrast, in order to exploit the functionalities offered by the CoMiFin system, FIs or service

providers sign a basic contract which regulates their participation in the CoMiFin cloud and the

usage of the CoMiFin system.

An actor participating in the CoMiFin cloud will provide data produced by its own internal moni-

toring software and, in return, it will get processed data by the CoMiFin system that allows it to

detect in advance possible threats and take the suitable countermeasures.

In this context, three case studies have been investigated with the aim to provide a clear image of

the monitoring and communication capabilities offered by the CoMiFin system. The first two cases

are related to some threats and failures that may occur in the global financial ecosystem such as

- DDoS attack [28]

malicious action aimed at making unavailable the services provided by financial actors such

as banks, telecommunication suppliers, insurances, and security agencies.

- MitM attack particular threat that is carried out when an active attacker inserts himself

between two communicating parties with malicious purposes.

Page 14: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

8 Chapter 1: Scenario and Requirements

In addition to these, one more case strictly related to the financial context has been discussed, as

it introduces such typical financial crimes as Identity Theft and related fraud. This case considers

the problems related to the misappropriation of identity information for criminal activities such as

obtaining goods or services by fraud.

This section goes on with the detailed description of the aforementioned case studies and of the

ways CoMiFin can possibly address them.

1.1.1 Distributed Denial of Service

Distributed Denial of Service (DDoS) attacks are carried out through geographically distributed at-

tack infrastructures, commonly referred to as botnets. They can consist of huge numbers of zombie

machines (some recent cases refer to more than one million of involved hosts) that are connected

to the Internet and controlled by the same attacker. In order to create the attacker’s distributed

infrastructure, a complex preparation phase is required. The CoMiFin monitoring system provides

benefits in two different DDoS scenarios:

1. Early detection of a DDoS attack

Assumption: in this case the CoMiFin monitoring system can provide benefits only if it is

possible to receive alerts about traffic spikes by the FIs and/or by Internet Service Providers

(ISPs).

2. Post-processing analysis

Assumption: the involved critical infrastructures (e.g., ISPs, FIs) want to share traffic and

system log data. In such a case there is no technical limit.

The preparation of this kind of attack can be very long. Usually, a specific malware propagates

through a huge number of machines, replicating itself several times and exploiting some known

vulnerabilities. Once the malware is installed on a machine, this becomes a so-called zombie. Addi-

tional required code is then downloaded from some designated repository, so that the compromised

machine can be ready to accept instructions from an entity called herder, that is in charge of

controlling all the zombie hosts and coordinating the whole attack. To overcome the limitation of

having a centralized coordinator, that would make the whole mechanism easy to block for security

systems, the communication between the zombies and the herder is carried out using some P2P

schemas or taking advantage from dynamic DNS.

The real attack begins when the herder orders it to the compromised hosts. The attack generally

lasts a little time and involves a very big number of malicious machines. The goal of DDoS is

saturating network connections of target host or however exhaust some of its resources in order to

prevent it from correctly delivering its services. This can be obtained simply generating licit re-

quests if the number of zombies is very high. If there are not so many malicious hosts, the requests

to submit can be chosen so that they are able to heavily stress the attacked system, for example

issuing queries for data that is hard to generate and that cannot be cached.

Page 15: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

1.1. REFERENCE SCENARIOS 9

After it has been launched, a DDoS attack can be detected noticing a sudden peak in resource

consumption that makes the target system unable to fulfill incoming requests. The only viable

reaction is the network traffic analysis aimed at understanding the characteristics of the attack and

identifying its sources, so that effective countermeasures (i.e. proper filtering rules) can be derived

and possibly disseminated to other interested actors (i.e. other FIs or ISPs).

The added-value of CoMiFin results from its ability to gather traffic data from several sources (ie:

FIs and ISPs) and correlate them in order to recognize suspicious patterns. In this way, possible

attacks to more systems of a particular FI or to several services of the same infrastructure can

be detected earlier because they are more likely to stand out from the aggregated logs than from

the log of single FIs. Moreover, deeper processing of network traffic can allow the identification

of common properties of attack sources. These properties can be employed to define simpler and

more effective filtering rules to disseminate to interested actors and to use for providing a common

database of past attacks.

1.1.2 Man in the Middle

Man in the Middle (MitM) attacks are carried out by making a legitimate user start a connection

with a rogue server, mimicking the legitimate server behavior. The rogue server is then able to

initiate a fraudulent transaction with the real server by pretending to be the legitimate client.

Unlike the DDoS scenario, where the target is the FI, MitM attacks concern customer-based e-

services, such as e-banking, e-payment, e-trading.

The CoMiFin monitoring system provides benefits in two different MitM scenarios:

� Early detection of a MitM attack

Assumption: in this case the CoMiFin monitoring system can provide benefits only if it is

possible to receive alerts about anomalous traffic patterns by the FIs.

� Post-processing analysis

Assumption: the involved critical infrastructures (e.g., ISPs, FIs) want to share traffic and

system log data. In such a case there is no technical limit.

This class of attacks requires the end user believe to communicate with a licit server, while in

reality it is sending its requests to a malicious one. There several ways for getting this situation

� a DNS server can be poisoned [25] in order to resolve specific hostnames to the IP address of

a rogue server;

� also a router can be compromised [21] so that packets are routed to a malicious server;

� one of the emerging method is phishing [29], that consists is making the end user believe to

interact with the legitimate web interface, while what it is actually watching is an indistin-

guishable copy of the original one;

Page 16: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

10 Chapter 1: Scenario and Requirements

� authentication mechanisms may present vulnerabilities that can be exploited to carry out

MitM attacks

Once a customer has been lured to a rogue Web server used as MitM, the attack is extremely simple.

The user authenticates against the rogue server by sending its credentials. The rogue server stores

the user credentials, relays them to the licit server, and forwards the response to the user on behalf

of the licit server. This mechanism has a twofold purpose: it makes the attack much more realistic

than a traditional phishing strategy, because the user receives the expected replies from the server;

furthermore, the MitM node is able to eavesdrop all data flowing between the client and the server,

thus accessing to a great amount of critical information. It is also worth highlighting that the

MitM attacker is able to modify on-the-fly the content of the transaction, for example by altering

sensitive and critical information.

These attacks can be difficult to detect, because they appear as a normal transaction between a

server and a licit (already known and registered) client. Current detection strategies are based

on the identification of anomalous traffic patterns. In the instance of a simple MitM attack, in

which only a very limited number of compromised servers is used to mediate the sessions of several

users of the same service, it is possible for a single FI to detect a possible MitM attack through

anomaly-detection algorithms. The underlying hypothesis is that customers usually access their

financial services from a limited number of devices, and that the same device is not used by a large

number of customers to access to the same financial services in a short period of time. Hence, a

licit server receiving a high number of connections on behalf of different customers from the same

device can trigger an alarm about suspicious activities.

The added-value of CoMiFin results from the possibility to early detect MitM attacks to different

organizations thanks to the correlation of traffic logs coming from the organizations themselves.

Once identified, suspicious IP addresses can be promptly and automatically disseminated to FIs

and ISPs in order to let them properly update their blacklists and check existing logs to locate any

past attack.

1.1.3 Identity Theft

Person’s identity and the ability to prove it are central to any commercial and financial activity.

Financial organizations need to verify an identity before issuing goods or services. They need

identity evidence to open bank accounts, obtain credit cards, finance, loans and mortgages, to

obtain goods or services, or to claim benefits. To this purpose, they need to ensure that:

� the person paying or applying for credit is who he/she say he/she is and lives where he/she

claims to live;

� the person’s name and address, and other information related to identity (e.g., SSN, fiscal

code, driver license number) are correct.

This section provides an overview of the motivations for considering identity theft and related

fraud crimes in CoMiFin, describes the preparation and the execution of these kinds of attacks,

Page 17: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

1.1. REFERENCE SCENARIOS 11

and presents some prevention and mitigation activities that can be managed by the CoMiFin

communication system. Here, identity theft is considered in its broadest meaning whereas related

frauds refer to financial crimes.

Fraudsters can impersonate a person and take out various forms of credit using a good name. This

phenomenon is commonly known as identity theft and identity fraud. There is a temporal and

functional difference between these two types of crimes

� Identity theft is the misappropriation of the identity (such as the name, date of birth, fiscal

code, SSN, current address or previous addresses, credit card number) of another person,

without their knowledge or consent.

� Identity fraud is the use of a misappropriated identity in a criminal activity, to obtain

goods or services by deception. An identity fraud usually involves the use of stolen or forged

identity documents (including credit card numbers), hence it comes after an identity theft.

The economic cost to society, to enterprises and more directly to FIs provides significant motivation

for the design of a comprehensive framework to prevent and arrest the growth of identity fraud and

related crimes from their current levels. Statistics are impressive, as outlined below.

� According to the Federal Trade Commission (FTC), ”identity theft cost American consumers

US $5bn and businesses US $48bn last year” [24]. US lenders report losses of $1 billion a

year due to identity fraud, but US authorities cannot accurately work out the total cost of

identity fraud because of its complexity. Credit card fraud (25%) was the most common form

of reported identity theft in 2006.

� In February 2006, the UK Home Office ”reported that the annual cost of identity fraud had

reached £1.72 billion” [30] up from ”£1.3 billion in 2002”.

� In Canada, the Better Business Bureau estimated that the identity theft and identity fraud

”annual cost was $2.5 billion to Canadian consumers and businesses, and the total annual

cost to the Canadian economy was estimated at $5 billion”.

� The Australian Institute of Criminology (AIC) estimates the cost of fraud to Australia is in

excess of AU $5 billion a year, which represents almost a third of the total cost of crime in

the nation (AU $19 billion) [23]. The cost of identity fraud in Australia was estimated to be

”AU $1.1 billion a year with an estimation error of AU $130 million in 2001-02” [27].

Generally, there are two types of attack methods: to data repository or to person.

The attack of a data repository, for example a database containing all the identities of a certain

service, can be carried out when sensible data is on move or exploiting temporary and uninten-

tional exposures. Moreover, a malicious person working inside an organization can steal this kind

of sensible data.

The attacks directed to persons can employ several channels. The most important and growing at

Page 18: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

12 Chapter 1: Scenario and Requirements

the moment is the e-mail, that is the mean used to carry out phishing attacks. Also traditional

mail and internet website are exploited to cheat people.

This category of frauds is of interest for CoMiFin. In fact these threats are increasingly growing

thanks to e-commerce and e-banking services provided by FIs and other related organizations.

People committing identity theft and fraud are used to create new false identities or to take some-

one else credentials to perpetrate financial crimes, that are obviously crucial for FIs. Moreover,

effectively facing this issue is a very valuable service for FIs, because identity related frauds are

dangerously undermining the confidence of customers to FIs, causing potential future huge losses.

Finally, there are practical difficulties in investigating these crimes and the percentages of resolved

cases are very low. So, FIs either writing-off amounts appropriated as loss or the use of alternative

modes of recovery via mercantile agents as a cost-effective response.

CoMiFin project is focused on credit card frauds. The possibility to share information about iden-

tity thefts and credit card transactions between FIs and credit card companies can be usefully

employed to maintain and distribute effective lists of suspicious web sites.

1.2 Requirements

The scope of CoMiFin is very wide. In fact it focuses on an absolutely complex topic that nowadays

is becoming more and more critical and concerns many different aspects, ranging from the practical

ways of making FIs cooperate to the devising of algorithms for detecting/predicting threats to

the addressing of security and performance issues. Due to the extent of this project, my work

and contribution have concerned some specific questions only. In this section, such questions are

described in order to assess the actual requirements that have driven me during my master thesis.

At this purpose, I’ve identified five main areas of interest

1. general CoMiFin requirements

2. SR management requirements

3. privacy and security requirements

4. event processing requirements

5. performance monitoring requirements

1.2.1 General CoMiFin Requirements

This section is about the general and highest level characteristics that CoMiFin system is expected

to exhibit. These requirements introduce also some key concepts and definitions that hereinafter

will be used often, anticipating some design choices that will be exhaustively explained in the next

chapter.

One of the most crucial achievement to fulfill is about giving real evidence that the system is able

Page 19: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

1.2. REQUIREMENTS 13

to improve the security of FIs. The advantages of employing CoMiFin instead of enforcing secu-

rity in the usual ways should be clear. More threats should be likely to be detected or predicted,

earlier enough to let the FIs effectively react by their own or on the basis of suggestions provided

by CoMiFin system. Moreover, the proposed solution should be cost-effective for participating

organizations.

One of the main key concepts that has heavily influenced the development of such system is that of

collaboration, already described in the introduction. The ability of CoMiFin to prove its strengths

is principally based on the assumption that the involved actors have the strong will to cooperate

each other for the common goal of raising their security level. Obviously, the quality of such cooper-

ation depends also on which organizations are actually participating. The services provided by FIs

(and targeted by the threats CoMiFin is willing to face) are increasingly relying on complex chains

of other low-level services supplied by other kinds of providers. Power providers, telco providers

and ISPs are as necessary as FIs for the correct delivery of that services. This implies that the

participation of all the organizations involved in this chain is another decisive prerequisite for the

success of the whole project.

Another primary concept that has been devised in the context of the project is that of Semantic

Room (SR). The concept of SR is an abstraction aimed at creating a mean for the collaboration.

Joining an SR, an organization has at disposal a contract-regulated channel for sharing raw data

(i.e. network traffic logs) and obtaining useful information for coping with security attacks. The

aim of an SR is putting together different actors to give them back the advantages derived from their

cooperation. From a technical point of view, an interesting challenge coming from this interaction

is the capacity of integrating several different domains that could be considerably heterogeneous

and widely distributed through the world.

Thanks to the concepts of collaboration and SR, we can get into details about the way CoMiFin

has been though to work. The actors that have joined an SR look for a common goal, for example

the detection and the prediction of DDoS attacks, or the blacklisting of sites suspected of being

the source of MitM attacks. They share log files and other resources of interest, by means of the

ability of the SR to integrate different domains and to gather a huge amount of input data. At

this aim, another relevant technical requirement concerns the capability to interface the existing

monitoring systems that are already installed on the resources of participants. These input data

are properly processed depending on the objective of the SR, and the resulting outputs (being

either the lists of suspected addresses or some advice about the suitable countermeasures to take)

are then disseminated to all the interested organizations.

1.2.2 SR Management Requirements

The SR concept introduced so far is central in all the activities carried out by CoMiFin system. The

management of the different SRs needs to be fully supported by the functionalities provided by the

system. In order to define a distributed monitoring system and allow data, events and information to

Page 20: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

14 Chapter 1: Scenario and Requirements

be exchanged among different domains, a global private interconnection infrastructure is required.

CoMiFin communication system shall support the creation of an overlay monitoring network on

top of the Internet and shall provide basic communication functionalities.

The organizations interested in taking part to CoMiFin should be supported by the system in the

creation of new SRs as well as in the join to existing SRs. Moreover, all the operations related to

the SR lifecycle management should be available. The possibility to leave a previously joined SR

or the option to disband an existent SR should be supplied.

As mentioned before, the join to an SR is regulated by a contract that should define the services

offered and the rules to obey, together with some specification about the quality of the declared

facilities and other possible technical and organizational aspects that characterize the SR itself.

1.2.3 Privacy and Security Requirements

Being committed to improve the security of FIs and ISPs on large scale, CoMiFin can’t miss to

prove itself to be attack-proof against security threats. This is consistent with the philosophy of

CoMiFin based on the protection of all the critical infrastructures taking part in the chain that

supports the delivery of financial services. In fact, from this point of view, CoMiFin system itself

is really a critical infrastructure.

Another important class of requirements about security concerns the communication of data. The

main properties to guarantee are the classic following four one

1. confidentiality

Message confidentiality means that the content of a message can be accessed only by its legit-

imate recipient(s). Possible recipients are a single CoMiFin participant, a subset of CoMiFin

participants or all CoMiFin participants. The best solution to handle secure group commu-

nications depends on the purpose that induces CoMiFin participants in sharing information

with others. These processes shall be regulated by well defined policies defining data formats,

the minimum level of security to be guaranteed by the participants during the communica-

tions, information security rules, processing rules, QoS levels and other aspects.

2. integrity

Integrity means that the content of the transmitted message has not been changed during the

transmission operation. That is, the recipient of the message is sure that data is identically

maintained, so that the received message is the same originally sent.

3. authentication of the sender(s)

Message authentication allows all the recipients of a message to determine which of other the

CoMiFin participants is the sender. If authentication information is generated according to

appropriate cryptographic techniques, such as digital signatures, it is also possible to provide

non-repudiation properties

4. non-repudiation of the message

Page 21: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

1.2. REQUIREMENTS 15

Non-repudiation means that the recipient of a message is able to demonstrate to a third party

that a specific CoMiFin participant sent the message itself. This property represents the basis

on which cryptographic proofs of misbehavior can be built

A critical point for secure communication is the interface between CoMiFin system and the internal

network of participants. CoMiFin system requires to access Internet resources and receive inputs

from appliances and devices deployed in the internal network of the CoMiFin participant. To

guarantee higher levels of security, CoMiFin resources should be deployed only in isolated network

compartments. Multiple layers of firewalls (packet filtering and application gateways) should be

used to regulate the traffic flow between CoMiFin cloud and participant’s internal network, as well

as the traffic flow between CoMiFin cloud and Internet resources.

Moreover, there should be the possibility for involved organizations to anonymize the data they

provide. In fact, CoMiFin system shall prevent its participants from gaining knowledge about

identity and operations of other CoMiFin participants that are not willing to share such information.

At the same time, it shall collect and maintain sufficient information to perform meaningful data

analysis.

Also the interaction with the end users should be properly secured. At this aim, besides the

classic employment of username-password authentication to access any CoMiFin service accessible

by browser, a step more should be added to guarantee an acceptable level of security. It’s worth

noticing that all the SR Management functionalities described before will be available through a

web interface. Since the operations that can be executed thanks to these functionalities have a

strong impact both on the contract and on the membership of SRs, they must be considered highly

critical. This means that a single malicious user could break the vital services provided CoMiFin,

becoming a real single-point-of-failure for the whole system. A viable solution for addressing this

issue can be the application of the principles of Segregation of Duties (SoD). The basic idea behind

the SoD is that to complete an operation that is considered critical, the intervention of at least two

different users is required. This way, the single-point-of-failure could be avoided.

1.2.4 Event Processing Requirements

This section describes the requirements about the processing activities carried out by the system

on input data provided by participants and the dissemination of related outputs. CoMiFin system

shall be able to collect, analyze and process large (and possibly disjoint) event streams in order to

produce derived events, extract complex threats detection patterns and detect mounting attacks,

imminent failures and other potentially dangerous activities. Event processing capabilities shall

include:

� event filtering and classification;

� event pre-processing and formatting;

� event aggregation and correlation

Page 22: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

16 Chapter 1: Scenario and Requirements

Moreover, the system shall be able to identify and suggest countermeasures and security policies to

be applied for preventing or reacting to malicious activities and for mitigating the effects of attacks,

failures and threats both at a local domain level and at the overall interconnected infrastructure

level.

1.2.5 Performance Monitoring Requirements

The contract that regulates an SR includes a section for QoS specification, where performance

constraints are likely to be set. In order to check the compliance to these constraints, monitoring

activities should be carried out to assess the actual performances offered by the system. For

example, interesting metrics for the processing performance could be the response time and the

throughput.

Page 23: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Chapter 2

CoMiFin Architecture

This chapter describes the architecture of CoMiFin system. The aim is designing a flexible archi-

tecture able to address all the requirements identified in the previous chapter.

Firstly, the CoMiFin Service model is introduced. It allows its clients to form business relationships,

called Semantic Rooms (SR), which can be leveraged for the information sharing and processing

purposes. Each SR is associated with a contract determining the set of services provided by that

SR along with the data protection, isolation, trust, security, availability, fault-tolerance, and per-

formance requirements. The CoMiFin middleware incorporates the necessary mechanisms to share

the underlying computational, communication and storage resources both across and within each

of the hosted SR’s so as to satisfy the requirements prescribed by their contracts.

The processing within an SR is accomplished through a collection of software modules, called

CoMiFin Applications, hosted within the SR. The architecture for the application hosting support

is derived based on the types of the hosted applications, and their resource management require-

ments.

The design of the CoMiFin middleware faces many challenges stemming from the need to support

on-line processing of massive amounts of live data in a timely fashion, wide distribution, resource

heterogeneity, and diverse SLA constraints imposed by the SR contracts. In the architecture pro-

posal, these challenges are addressed by following a top-down approach wherein the system is first

broken down into a collection of high-level components whose functionality and interaction with

each other are clearly specified. The state-of-the-art is then further elaborated for each of the

proposed components and several design alternatives are presented to serve as a starting point for

the ensuing design and development effort.

2.1 The CoMiFin Service Model

The primary functionality supported by the CoMiFin middleware is to facilitate information ex-

change and processing among participating principals for the sake of identifying threats against

their IT infrastructure and business. The information sharing is facilitated through the SR ab-

straction, which provides a trusted environment for the participants to contribute and analyze

17

Page 24: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

18 Chapter 2: CoMiFin Architecture

the input data. The input data can be real time security events, historical attack data, logs, etc.

This data can be generated by local monitoring subsystems (such as system management products,

Intrusion Detection Systems (IDSs), firewalls, etc.) installed within the IT of the participating

principals, but also from external sources.

The processing within the SR is accomplished through various types of applications, which can

support the following functionality:

1. data pre-processing, which may include filtering and anonymization;

2. on-line data analysis for real-time anomaly detection (such as complex event processing);

3. off-line data analysis;

4. long-term data storage for the future off-line analysis and/or ad-hoc user queries.

The results of the processing can be either disseminated within the SR, or exported to other SRs

and/or external clients. The output might include descriptions of the suspicious transactions,

users, network addresses, attack signatures, etc. The rules for information sharing and resource

provisioning within the SR are governed by the SR contracts.

2.1.1 CoMiFin Principals

CoMiFin principals are those organizations that can exploit the functionalities made available by

the system. Specifically, in current design and implementation, the following types of entities can

be involved:

� FI stakeholders are the Financial Institutions primarily targeted by CoMiFin. Provided

services are aligned with the needs of these organizations: banks, insurance companies, in-

vestment management organizations;

� FI Utilities forming the operational environment of financial institutions are the organizations

that can influence the IT of the FI stakeholders (ISP, Telco, Power Grid, SW services, HW

providers, external service providers);

� Additional FI players are organizations that play an important role in the interaction of

FI stakeholders (government agencies, regulatory agencies, FI associations, brokers, stock

exchange, rating agencies);

� CoMiFin Providers are organizations that offer various types of IT resources for the operation

of the system (ISP, ASP, HW/SW provider, hosting businesses, cloud computing platform

provider);

� CoMiFin Authority is an organization that is trusted by the principals. The CoMiFin Au-

thority (CA) is responsible for the organization and operation of the whole CoMiFin system.

It can host some centralized parts of the system. Typically, a CA is a trusted third party,

Page 25: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

2.1. THE COMIFIN SERVICE MODEL 19

such as a central bank, a regulatory body or an association of banks. Such an authority not

necessarily provides resources for CoMiFin but is needed for the management of contractual

processes;

� CoMiFin Partner is a business entity or organization interested in the CoMiFin services.

Unless otherwise stated, the term Partner will be used instead. A CoMiFin Partner subscribes

to the system by signing a basic contract and can belong to either FI stakeholders, FI utilities,

additional FI players, CoMiFin Providers;

2.1.2 Semantic Rooms

An SR is a federation formed by a subset of partners for the sake of information sharing and

processing. Partners can participate in a specific SR performing a join operation.

Each SR is associated with a contract that defines the set of processing and data sharing services

provided by that SR along with the data protection, isolation, trust, security, dependability, and

performance requirements. Since SRs will typically encapsulate various types of data processing

services (such as on-line event processing, or long-running analytics and intelligence extraction),

the data processing scenarios will be used to illustrate the discussion of the SR functionality below.

Figure 2.1 depicts the SR data handling. As shown in this figure, SR participants provide raw data

that are then processed in order to produce processed data. Raw data may include real-time data,

inputs from human beings, stored data (e.g., historical data), queries, and other types of dynamic

and/or static content.

Processed data can be used for internal consumption within the SR (the lined arrow in Figure 2.1):

in this case, derived events, models, profiles, blacklists, alerts and query results can be fed back into

the SR so that the participants can take advantage of the intelligence provided by the processing.

In addition, a (possibly post-processed) subset of data can be offered for external consumption (the

diamond arrow in Figure 2.1). The processed data can be rendered by means of Graphical User

Interfaces (GUIs) or Dashboard applications.

Figure 2.1: SR data hadnling

Page 26: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

20 Chapter 2: CoMiFin Architecture

Semantic Rooms Members and Clients

The organizations that participate in an SR are called SR Members. They have full access to both

all the raw data that the members agreed to contribute by contract, and the data being processed

and thus output by the SR. The Members can also be in charge of performing processing and

dissemination.

In addition to the SR Members, there might exist SR Clients. They cannot contribute raw data

directly to the SR but can be the consumers of the processed data the SR is willing to make

available for the external consumption. Figure 2.2 illustrates the SR Client and Member roles.

The CoMiFin Partner that instantiates a new SR becomes the SR Administrator of such SR. It

Figure 2.2: SR Members and Clients

is in charge of

� defining the details of SR contract;

� managing/supervising the on-going SR operations;

� possibly cancel the join of Members;

� finally disband the SR.

SRs can communicate with each other. In particular, processed data produced by an SR can

be injected into another SR (e.g., through the SR client role above). The ability for SRs to

communicate with one another may enable a composition of multiple services, provided by each

individual SR, into higher-level functionalities. For instance, processed data in the SR related to

”DDoS attacks in banks” can be used by a more specialized SR such as ”DDoS attacks in banks

of a country” whose data can be in turn used by the SR related to ”DDoS attacks of a bank in a

specific country” in order to provide CoMiFin partners with richer services.

Semantic Room Contract

An SR Contract is used to define the set of rules for accessing and using data processed within

the SR. There is a contract for each SR. Partners willing to participate in the SR must sign it.

According to state of the art on contracts definition, an SR contract includes four main parts

Page 27: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

2.2. COMIFIN ARCHITECTURE 21

1. details of the involved parties

2. contractual statements

3. Service Level Specifications (SLSs)

4. the signatures of the involved parties

2.2 CoMiFin Architecture

The framework that supports the Semantic Room abstraction over a pool of (locally and geograph-

ically) distributed computational, storage, and network resources is shown in Figure 2.3. In the

framework we can clearly identify two principal layers: the SR management layer which is respon-

sible for the management of the SR, and the Complex Event Processing and Applications layer

which realizes the SR processing and sharing logic. In addition, all the architectural components

of both layers above can utilize various commodity services for

� exchanging control and monitoring information among them (such as load and availability),

� managing resource allocation to the complex event processing and applications both off-line

and at run time through the use of controllers and schedulers,

� storing processing state and data. In the following subsections we describe in more detail the

layers of our framework

Figure 2.3: The framework

Page 28: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

22 Chapter 2: CoMiFin Architecture

2.2.1 SR Management

This layer is responsible for supporting the SR abstraction on the top of the individual processing

and data sharing applications provided by the Complex Event Processing and Applications layer.

It embodies a number of components fulfilling various SR management functions. Such functions

include the management of the entire SR lifecycle (i.e., creation of an SR, instantiation of an SR,

disband of an SR, management of the SR membership), the registration and discovery of SRs

and SR contracts, the configuration and planning of SRs, the management of the communications

among different SRs, and the management of trust and reputation within SRs. In addition, each

SR member interfaces an SR through the use of a component of this layer termed SR gateway.

This component transforms raw data into events in the format specified by the SR contract. In

general, this transformation is necessary as depends on the specific SR objective to be met and com-

prises three distinct pre-processing steps; namely, filtering, aggregation, and anonymization. This

latter consists of applying different anonymization techniques in case privacy and confidentiality

requirements are prescribed by the SR contract.

2.2.2 Commodity Services

A number of services can be used by both layers previously mentioned in order to provide func-

tionalities such as communication, storage, resource and contract management, and monitoring.

These services are transversal to the other layers and are described in isolation in the following,

highlighting several design alternatives available at the state of the art and that can be used for

their implementation.

Storage Services

The Storage Service layer of the architecture consists of a collection of components providing various

kinds of storage services to the other layers. This layer can embody services for long term storage of

large data sets, such as monitoring logs and historical data , and for low latency storage of limited

amounts of real-time data.

Communication

The Communication layer consists of a collection of components providing various kinds of com-

munication services to the other layers. This layer can include large-scale group communication

services, reliable low-latency, high throughput message streaming services that are useful for sup-

porting real time event streaming from within the external components into an SR, and pub-

lish/subscribe services.

Resource and Contract Management

The Resource and Contract management layer is responsible for allocating physical resources (such

as computational, storage, and communication) to the SRs so as to satisfy the business objectives

Page 29: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

2.2. COMIFIN ARCHITECTURE 23

(such as performance goals, data and resource sharing constraints, etc.) prescribed by the SR

contracts. A capacity planning study might be completed in the preliminary phase of an SR

startup. In such a way, the Resource Management layer ”can be aware” of the maximum capacity

that each SR has to provide in terms of computational power, throughput, memory and storage.

To this end, this layer can include a scheduler and placement controller for initial allocation of the

services to the physical resources, and a runtime load balancer for possible dynamic re-allocations.

As each SR can count on a set of (locally and geographically) distributed data and computational

resources, we find convenient to consider the following three alternatives for the SR deployment

that can be all three supported by our framework:

� SR-owned platform

the computational resources of each SR are owned by its members, although one member is

deputy as an SR administrator. The computational platform of an SR is fully dedicated to

the complex event processing and applications of that SR.

� Third party-owned platform

the computational resources are owned by a third party. The computational resources could be

shared among the complex event processing and applications of the SRs. The collocation and

data flows can be subject to some restrictions specified by the SR contract. This alternative

corresponds to the so called ”SR as a service”

� Mixed platform

the computational resources of each SR are owned by the members of that SR. This platform

runs the logic of its SR, but it can occasionally offer hosting services for running the logic of

other SRs. This case is allowed only after explicit request coming from another SR where its

complex event processing and applications exceed the capacity of its computational platform

and implies some business (and trust) relationship among the involved SRs.

Metrics Monitoring

The Metric Monitoring layer is responsible for monitoring the architecture in order to assess whether

the requirements specified by the SR contract are effectively met. As the Metrics Monitoring

is a transversal layer of the framework, it operates at both the SR Management and Complex

Event Processing and Applications layers. In particular, at the SR Management layer it is in

charge of periodically collecting monitoring information related to the management of SRs (e.g.,

SR membership information) in order to detect whether that management violates the requirements

included into SR contracts. The Metrics Monitoring keeps track of the dynamic behavior of the SRs

and check whether or not SRs and SR members themselves are honoring their respective contracts.

In case the Monitoring detects that SR contracts are close to be violated, it interacts with SR

Management components in order to trigger proper reconfiguration activities.

At the Complex Event Processing and Applications layer (see below), the Metrics Monitoring is in

charge of periodically evaluating whether or not the resource management required by this layer is

Page 30: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

24 Chapter 2: CoMiFin Architecture

effectively able to support the execution carried out within this layer. In addition, it is responsible

for detecting whether or not the processing execution violates all those requirements specified into

the SR contracts. The Metrics Monitoring uses ”sensors”, possibly located at physical resource

and container levels, in order to obtain the set of information required for enforcing the metrics of

interest (in our current implementation we favored the use of Nagios monitoring technology [12]

for metrics monitoring purposes).

2.2.3 SR Complex Event Processing and Applications

This layer consists of applications implementing the data processing and sharing logic required to

support the SR functionality. A typical application being hosted in an SR will need to fuse and an-

alyze large volumes of incoming raw data produced by numerous heterogeneous and possibly widely

distributed sources, such as sensors, intrusion and anomaly detection systems, firewalls, monitoring

systems, etc. The incoming data will be either analyzed in real-time, possibly with assistance of

analytical models, or stored for the subsequent off-line analysis and intelligence extraction. This

suggests a characterization of the applications supported by an SR, whose runtime instances can

be hosted within various runtime container components. In particular the application containers,

which can be either standalone or clustered can be the following:

� Event Processing Container

This container is responsible for supporting event-processing applications in a distributed

environment. The applications manipulate and/or extract patterns from streams of event

data arriving in real-time, from possibly widely distributed sources, and need to be able to

support stringent guarantees in terms of the response time and/or throughput.

� Analytics Container

This container is responsible for supporting parallel processing and querying massive data sets

on a cluster of machines. It will be used for supporting the analytics and data warehousing

applications hosted in the SR.

� Web Container

This container will provide basic web capabilities to support the runtime needs of the web

applications hosted within an SR. These applications support the logic enabling the interac-

tion between the client side presentation level artifacts (such as web browser based consoles,

dashboards, widgets, rich web clients, etc.) and the processing applications.

Different implementations of the Complex Event Processing and Applications layer can be sup-

ported by our framework. In particular, it can be possible deploying an SR that uses a central

server for the implementation of both event processing and analytics containers. In this case, finan-

cial institutions of the SR send their own data to the central server). The central engine performs

the correlation and analysis of the data and sends back to the financial institutions the generated

processed data to let each financial institution adopt its own countermeasures in a timely fashion.

Page 31: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

2.2. COMIFIN ARCHITECTURE 25

However, although this solution is fully supported by our framework, it suffers from the inherent

drawbacks of a centralized system. The central server may become a single point of failure or secu-

rity vulnerability: if the server crashes or is compromised by a security attack, the complex event

processing computation it carries out can be unavailable or jeopardized. In addition, the volume

of events the central server can process in the time unit is limited by the server’s processing and

bandwidth capacities, thus limiting the system scalability. Therefore, in our current implementa-

tion of the architecture we favored the use of technologies that allow us to realize a decentralized

complex event processing.

Page 32: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

26 Chapter 2: CoMiFin Architecture

Page 33: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Chapter 3

SR Life Cycle Management

The most of my contribution to CoMiFin project has been focused on the development of the SR

Manager component. It is in charge of managing the life cycle of the SRs and all the interactions

with the Partners and users of CoMiFin system. A more detailed description of SR model and

involved users is provided so that design choices can result clearer. Since it is the main element

for the management of SR life cycle, its interactions with the other CoMiFin components are also

fundamental for better understanding its internal organization.

This chapter is organized accordingly to the aforementioned topics.

� Firstly, the requirements specific to SR Manager are identified and refined. All the matters

about SR abstraction are issued, from the details of SR life cycle to the structure of the

contract. An exhaustive explanation is also given about the way users can be managed in

order to apply SoD principles.

� Subsequently, an high-level design is presented, based on the interactions with the other

components as well as with the end users, so that a functional overview of SR Manager can

be provided.

� Then, an architecture is exposed, together with a formal description of the data model on

which the SR Manager in based.

� Finally, few words are spent about some implementation details, such as employed technolo-

gies.

3.1 Requirements

This section puts the basis for the next functional design of the component. Concepts that are

very important for SR model are definitely defined, such as Basic Contract, Contract Schema. The

life cycle of an SR is then completely specified, so that more formal approaches can be used to

implement related software modules. The way SoD actually applies to user management is also

described.

27

Page 34: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

28 Chapter 3: SR Life Cycle Management

3.1.1 SR Model

So far, the concepts about SRs have not been deepened enough to be able to proceed with the

design of the related management functionalities. In fact, the life cycle of a generic SR should be

formally defined. Before an SR could be instantiated, a contract should be specified, so a precise

knowledge about contract structure should be provided. Moreover, an explanation on how a generic

organization is expected to participate in CoMiFin is missing. In order to tell the whole story, this

section is structured as follow

� Basic Semantic Room and registration: explains the way an interested actor can take part to

CoMiFin;

� Contract and schemas: defines the structure of a contract and introduces the important

distinction between SR Schema and SR Instance;

� SR life cycle: the various phases of the existence of an SR are specified.

Basic Semantic Room and Registration

In the previous chapter, the CoMiFin Authority (CA) entity has been introduced. The fact that

it is trusted by every Partner is crucial and can be exploited to enforce some restricted rules for

participating in CoMiFin. With the aim of reusing existing concepts, the participation of an entity

to CoMiFin has been modeled employing the notion of SR. The system comes with a fixed SR

called Basic Semantic Room whose Administrator is the CoMiFin Authority. An entity willing to

become a Partner has to join the Basic SR, signing the related Basic Contract. At the moment, the

exact content of such contract has not been decided, it will become surely clearer once the system

will be going in production. The actual point to get here is the fact the becoming a CoMiFin

Partner corresponds to join the Basic SR.

Obviously, the mechanisms used to let an organization join the Basic SR are different from those

employed to make a Partner join a generic SR, because in the first case the organization must be

registered to the system and given the required accounts. Moreover, to avoid that any company

participate to CoMiFin, a point should be set where a decision could be taken about accepting or

not a new Partner. To address this issue, a registration phase has been included, where an entity

interested to join CoMiFin specifies its general information and submits its request to participate

to the CA, which in turn decides whether such entity owns all the characteristics to take part into

the system and applies its own choice.

Contract and Schemas

This section is about SR contracts and concerns more generally the information needed to create

a brand new SR, with the aim of understanding which could be the best way of formulating

such functionality. Before going on, it’s important to give a more formal specification of the SR

abstraction. It is defined by the three following principal elements:

Page 35: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.1. REQUIREMENTS 29

� the contract

each SR is regulated by a contract that defines the set of processing and data sharing services

provided by the SR along with other QoS requirements. The contract also contains the

hardware and software requirements a member has to provision in order to be admitted into

the SR.

� the objective

each SR has a specific strategic objective to meet. For instance, there can exist SRs created

for implementing large-scale stealthy scans detection, or SRs created for detecting Man-In-

The-Middle attacks.

� the deployment

the event processing engine of each SR is implemented using a specific technology and a

particular set of software.

A simple example can help to introduce next choices. Let’s assume that there exists an SR with its

own members aimed at tackling DDoS attacks and that its event processing engine is implemented

in a certain technology. Afterwards, other CoMiFin Partners decides to form another SR for facing

the same DDoS threat, but with an event processing engine using a different technology, maybe

because a better technology has came out or for example because the license of the other is not

suitable per these Partners. It’s clear that the two SRs share many characteristics, in fact they

are likely to be different only for the technology employed and maybe for the value of some QoS

parameter.

As another example, consider again an SR for DDoS but constrained to have only Italian members.

This could happen for example because Italian banks and interested telco providers have found a

convenient common agreement for this collaboration. Then, assume that a similar agreement is

reached also in Spain, so that another SR for DDoS is created, this time forced to accept only

Spanish members. In this case also, the two SR are almost equals, in that they differ only for the

membership.

These examples are to show the potential need of instantiating the same SR in different ways. In

order to address this requisite, a distinction has been made between the schema of an SR and the

instance of an SR. The idea is that when an SR is created, an SR schema is defined, specifying a set

of information that will be discussed later. The instantiation of an SR boils down to the creation

of an SR instance starting from an SR schema. Such an instantiation consists in the definition of

further information besides the ones derived from the SR schema. Another distinction that arises

naturally is the one between a contract schema and a contract instance. An SR schema is associated

with a contract schema that defines a set of information. An SR instance derived from that SR

schema is associated with a contract instance that is based on that contract schema. Furthermore,

an SR schema can define a set of possible deployments, each of one corresponding to a different

set of software that set up an instance. When an SR is instantiated, one of those deployments is

chosen.

Page 36: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

30 Chapter 3: SR Life Cycle Management

Resuming what has been told in the previous chapter about the content of a contract, and putting

together the new concepts described so far, it’s possible to structure the information contained in

a contract in a more precise way:

1. details of the involved parties

this section is for listing the Administrator, the Members and the Clients of the SR;

2. contractual statements

here the following data are placed

� the objective

� a unique identifier (SRID)

� the list of provided services

� the list of rules to obey for obtaining the aforementioned services

3. Service Level Specifications

the SLSs are firstly grouped in several categories, for example ”Data processing” or ”Security

and trust”; then each SLS is defined with its name, its unit, its threshold value and possible

other attributes of interest;

The schema of a contract defines or suggests some these information. While the membership and

the SRID are specific for the single instance, all the other information can be specified, or at least

structured, in the schema. More precisely:

� the objective is defined in the schema but can be changed in the contract instance, to ad-

dress situations like the one presented before about a DDoS SR for Italian and another one

for Spanish; in fact, the contract schema may specify ”SR for DDoS attacks” and contract

instance may change it to ”SR for DDoS attacks in Italy/Spain”;

� the list of provided services as well as the list rules are defined in the schema and cannot be

modified in the instance;

� the schema defines the list of the SLSs that will be available for the instances; in the contract

instance, the sub list of SLSs to include will be selected among the available ones, and for

each one of them a threshold value and a possible penalty will be specified.

A contract instance can be presented as an XML document, so the contract schema can be thought

of like something that specifies how such an XML document has to be structured. The natural

mapping that takes place is between a contract schema and an XML schema.

SR Life Cycle

This section describes all the operations about the life cycle of an SR that can be executed thanks

to the support of the SR Manager.

Page 37: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.1. REQUIREMENTS 31

The first operation is the creation of an SR Schema suitable for the purpose. During such a creation,

all the information described in the previous section are specified by an end user. To instantiate an

SR, an SR schema has to be chosen and missing information (updated objective, SLSs to include,

deployment to use) have to specified. The Partner that has executed the instantiation operation

becomes the Administrator of the SR.

Once instantiated, an SR can be joined by CoMiFin Partners that are interested in the services

it provides. In order to effectively become a member or a client of an SR, its Administrator has

to accept the join. In this way, a sort of authority is introduced to control the membership of

the SRs. A member/client of an SR can then leave the SR on its own or can be forced to leave

by the Administrator, for example because several violations of SLSs due to its actions have been

detected.

Finally, an instantiated SR can be disbanded by its Administrator to force the end of SR’s work.

3.1.2 Segregation of Duties

In order to meet the requirements of privacy and security illustrated in section 1.2.3, the concepts of

Segregation of Duties (SoD) [15] have been taken into account for the design of each functionality.

As already stated, the basic notion of SoD is that, for each critical operation, the intervention of

at least two different users is required. This is for preventing that the bad behavior of a single

licit user or the usage of a stolen identity by a malicious user could disrupt the whole system.

The first step for enforcing SoD is the identification of the operations that are to be considered as

critical. Each of them has to be split in at least two tasks, so that its completion actually requires

the execution of many distinct and consecutive tasks. The next step is forcing these tasks to be

fulfilled by different users. At this regard, a Role Based Access Control (RBAC) can be effectively

employed.

Before going on with how the SoD is practically applied in the SR Manager, it’s necessary to define

the profiling of SR Manager users, because it concerns also with the role assignment, that is the

information missing for completely understand the remainder of this section. The end users of the

SR Manager access its functionalities using a web browser. A username-password authentication is

required to begin the interaction with the SR Manager, so a licit user is given an account with the

needed credentials. An account refers to a Partner, meaning that the user to which the account

has been furnished works for that Partner, at least from the point of view of CoMiFin. A Partner

can have many related accounts, while a single account refers to a single Partner only.

Similarly to a Partner playing one or more roles within a certain SR (Administrator, Member,

Client), an account plays one or more roles within its Partner. Each role enables the account

having it to access specific functionalities of SR Manager. This is the list of the roles supported by

the SR Manager

� CoMiFin Authority

this role is given at installation-time to a predefined account that is used to carry out the

actions of CoMiFin Authority Partner, for example accepting new Partner;

Page 38: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

32 Chapter 3: SR Life Cycle Management

� Creator

this role enables an account to manage software configurations and SR Schemas and can be

given to any account;

� Business Manager

there is exactly one account per Partner with this role, that is given to the account that

is defined when a new Partner is registered; this account is in charge of beginning all the

operations related to SR life cycle (instantiation, join, leave, disband);

� Technical Administrator there is exactly one account per Partner with this role, that is given

to an account defined by the Business Manager of the Partner itself; this account is in charge

of managing all the technical aspects of joining the SRs (configuring the resources to employ

in a join, download and install SR software, etc.);

� Operator

there is no limit at all on the number of account in a Partner with this role; accounts that

already play CoMiFin Authority or Business Manager or Technical Administrator roles can’t

have Operator role; this role enables the account to access simple browsing functionalities,

without any mean for configuring/managing other aspects.

At the moment, the principles of SoD have been applied to the join operation. This operation

has been chosen for two reasons. First of all, it should be considered a critical operation because

enables the participation of an organization to a running SR, and this could be risky both for the

Partners that are already member/client of the SR and for the Partner willing to join. In fact, the

collaboration supported by the SR abstraction relies on the integration of the resources provided

by each participant, exposing them to possibly not trusted companies. The other reason concerns

the different skills required for carrying out all the join. A business knowledge is needed to properly

choose the right SR, understand its contract and assess the nature of its members. On the contrary,

a more technical profile is suitable for configuring the resources the Partner is willing to share with

the cloud. So, the SoD here is a choice driven by both security and areas of expertise reasons.

Putting together both the SoD and the fact that a join needs to be accepted by the Administrator

(that is, its Business Manager) of the SR, the whole join operation can be split in the following

tasks, each of them with the indication of the role required for its execution

� begin the join

the Business Manager of the Partner that wants to join the SR can begin the join;

� configure the join

the Technical Administrator of the Partner that wants to join the SR can then configure the

resources the Partner is willing to join the SR with;

� accept the join

the Business Manager of the Administrator of the SR can accept the join;

Page 39: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.2. HIGH-LEVEL DESIGN 33

� complete the join

the Technical Administrator of the Partner that wants to join the SR can download the

software required to join the SR and install them on the previously configured resources; at

this point the join can be marked as completed

The evidence of the correct application of SoD principles comes from noticing that three different

accounts are necessary for completing the join operation.

3.2 High-Level Design

The requirements pointed out int chapter 1 and refined so far, together with the general architecture

introduced in chapter 2, drive the detailed design of SR Manager. This section identifies the macro-

functionalities that SR Manager shall provide, describing each of them in respect to the interactions

with the other components and with the users; UML sequence diagrams [17] are used to model

such interactions. The final result of this section is the UML component diagram of CoMiFin

components from the point of view of SR Management modules.

The following macro-functionalities have been distinguished, as also depicted in Figure 3.1

� Partner and Account management

� Software management

� SR Schema management

� SR life cycle management

� SLA violation management

A specific subsection is dedicated for each of them. This section ends with another subsection

which describes the aforementioned component diagram.

Figure 3.1: SR Manager macro-functionalities

3.2.1 Partner and Account Management

This macro-functionality regards all the operations related to the registration of a new Partner and

the configuration of its accounts. The considered functionalities are

Page 40: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

34 Chapter 3: SR Life Cycle Management

� Partner registration

� Operator registration

� Account profile editing

� Creator role granting

� Account enabling/disabling

� Business Manager enabling/disabling

Partner Registration

Figure 3.2 illustrates the interactions that occur among the SR Manager, the user that registers the

Partner and the CoMiFin Authority that is in charge to confirm such registration. The interaction

Figure 3.2: Partner registration

takes place through the web GUI provided by the SR Manager. The creation (registration) process

consists in 3 separated phases, carried out by two different users, so that SoD requirements can be

met. Here we are interested in the high level interactions between CoMiFin components. So we

don’t enter in details about the way the user actually specify the information about the Partner and

the Business Manager. This is the reason why we model all these operations with a single message

Page 41: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.2. HIGH-LEVEL DESIGN 35

register(partner, business manager). This message represents all the interactions between the user

and the SR Manager (through the web GUI) for this first phase. The store(business manager)

message actually stores the Business Manager account as unconfirmed, preventing it from logging

into the system. Later, the CoMiFin Authority will confirm such account so that it can login. The

ackRegistration(partner, business manager) message has been included to model the obvious fact

that the user really receives a visual confirmation that the creation process has been successfully

completed. The notifyRegistration(partner, business manager) message represents the notification

(by e-mail) to CoMiFin Authority that a new Partner has been registered and that CoMiFin Au-

thority confirmation is required to complete this operation. Upon such confirmation, represented

by confirm(business manager) message, the SR Manager updates the Business Manager account

through update(business manager) message. At this point, the CoMiFin Authority receives from

SR Manager an ackConfirmation(business manager) message meaning that Business Manager Ac-

count has just been confirmed. Moreover, the Business Manager is informed about this thanks

to notifyConfirmation(business manager) message, sent by e-mail. Now the Business Manager can

login and create Technical Administrator account. This interaction between the Business Manager

and the SR Manager through the web GUI is modeled by create(technical administrator) message.

Afterwards, the Technical Administrator account is stored with the store(technical administrator)

message and the Partner is finally enabled through the enable(partner) message. The last message,

ackCreation(technical administrator), represents the visual confirmation that the whole operation

has been successfully completed. It’s worth noticing that the procedure can be considered as com-

pleted only after the definition of the Technical Administrator. In fact, if a Partner had its Business

Manager as the only account, it couldn’t join any SR.

Operator Registration

Once the Technical Administrator has been registered, the Business Manager can configure any

number of Operators.

Account Profile Editing

Any registered account can edit its own profile, for example for updating its personal data or change

the password.

Creator Role Granting

The Creator role can be given to any account. The CoMiFin Authority is the only user enabled to

grant it.

Page 42: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

36 Chapter 3: SR Life Cycle Management

Account Enabling/Disabling

The Business Manager of a Partner can enable/disable any of the account of the Partner itself,

preventing it from logging into the system.

Business Manager Enabling/Disabling

The CoMiFin Authority can enable/disable the Business Manager of any of CoMiFin Partner,

preventing it from logging into the system.

3.2.2 Software Management

The software is a very important entity for SR Management. In fact, when an SR Schema is defined,

the Creator has to configure all the software needed for each of the expected deployments. Each

software has to be described and uploaded to the system, so that the Technical Administrator can

download it when its Partner is going to join an SR. The usual operations of addition, update and

deletion are provided by the SR Manager to users having the Creator role.

3.2.3 SR Schema Management

An SR Schema is a sort of template that enables the instantiation of SRs which differ for a very

few details. The SR Manager provides a wizard that guides the end user through the definition of

a new SR Schema. Only users with Creator role can access such wizard.

The wizard consists of 5 steps

1. objective and description definition

this first step is for writing a textual description of what the derived instances are expected

to do; in particular, the objective should describe the actual goal the future members will be

willing to achieve;

2. services definition

the next step if for defining the list of the services provided by the SR;

3. rules definition

then, the rules for obtaining the aforementioned services are to be listed;

4. SLSs definition

in this step, the Creator has to choose the SLSs that will be included in the contract of derived

instances; once an SLS has been selected, the flag for make it mandatory could be set and a

penalty could be associated to run-time violations of such an SLS;

5. deployments definition

this final step is for specifying the deployments, that are the possible ways an SR can be

deployed once instantiated; It’s worth noticing that a deployment can be associated to any

Page 43: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.2. HIGH-LEVEL DESIGN 37

subset of the software already configured into the system, and that deployments can be

added/removed using the homonym buttons;

3.2.4 SR Life Cycle Management

Although these operations have been already described before, here the focus is on the coordinated

interaction between the SR Manager and the other CoMiFin components. A sequence diagram is

provided for each operation.

SR Instantiation

Figure 3.3 highlights that once the Business Manager has completed the instantiation, the infor-

mation about the brand new SR are stored into the SR Registry and a message is published to a

specific topic for notifying the interested components that another SR is available. Such message is

then delivered to the SLA Manager, that reads the related contract and derives from it the SLAs.

These are stored into the SR Registry again, allowing the MeMo to get them for enforcing its

monitoring activities.

Figure 3.3: SR instantiation

SR Join

Figure 3.4 models what has been already stated about the join and gives an idea about its com-

plexity. Besides the high number of human users involved, it’s worth observing that

� proper e-mails are sent to the user that should act next;

� the information about the SR are updated in the SR Registry;

Page 44: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

38 Chapter 3: SR Life Cycle Management

� the MeMo asks the SR Manager which are the resources to monitor for that SR, since they

are likely to have changed consequently to this update.

Figure 3.4: SR join

SR Leave

Figure 3.5 presents the spontaneous leave performed by the Business Manager of the Partner

wanting to exit the SR. This operation affects the membership section of the contract, so these

changes are to be applied to the data stored into the SR Registry. A notification is also published,

because the MeMo is likely to be interested that some resources are no longer to be monitored.

SR Forced Leave

When the Administrator makes a Partner leave its SR, what is different is only the name of the

operation (besides the user allowed to execute it) in fact it’s called Forced Leave. All the other

Page 45: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.2. HIGH-LEVEL DESIGN 39

Figure 3.5: SR leave

interactions remain the same.

SR Disband

An Administrator can make its SR stop working. Figure 3.6 shows such operation. Again, related

data are updated into the SR Registry.

Figure 3.6: SR disband

3.2.5 SLA Violation Management

When the SLA Manager detects a violation of an SLS defined in a contract, it notifies the SR

Manager so that a possible penalty is applied to responsible Partner. The action undertaken by

the SR Manager obviously depends on what has been configured in the SR Schema from which the

SR has been instantiated.

3.2.6 Component Diagram

Thanks to the interactions pointed out so far, a more detailed knowledge has been provided about

how the SR Manager is placed within the whole CoMiFin system. Figure 3.7 expresses this idea

Page 46: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

40 Chapter 3: SR Life Cycle Management

in a more formal way using an UML component diagram [17]. The interfaces exposed by the SR

Figure 3.7: Component diagram

Manager are

� UserProfilePort

The Dashboard is in charge of displaying many visual information about monitored metrics.

It also enforces some authorization rule exploiting the related configuration set up into the

SR Manager. This interface serves to let the Dashboard get such configuration.

� SLAViolationPort

When the SLA Manager detects a violation, it uses this interface to notify the SR Manager.

� ResourceMapPort

Once a new member has joined an SR, the resources the Partner has declared to participate

with have to be monitored by the MeMo. This interface lets the MeMo read the details about

such resources.

� SREvents

This is actually a topic where the SR Manager publishes all the news about SR life cycle

events: instantiation, join, leave and disband.

Page 47: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.3. LOW-LEVEL DESIGN 41

The unique interface the SR Manager uses is SRPort, exposed by the SR Registry to let SR data

be inserted, updated and removed.

3.3 Low-Level Design

This section presents the high-level design of SR Manager. An architecture is firstly proposed,

together with the description of its layers and sub-components, then its data model is formally

defined.

3.3.1 Architecture

In order to obtain a better separation of concerns and introduce indirection levels that may turn

to be quite useful, the proposed architecture has been structured laying the sub-components out

in several layers. Figure 3.8 shows such architecture. The way the layers have been conceived has

Figure 3.8: SR Manager architecture

been heavily affected by the technology chosen for the implementation. The whole component has

been designed having in mind JBoss Application Server [9] as target middleware. This choice is

motivated by the will of using a cross-platform language such as java and exploiting as much as

possible the services usually provided by a middleware solution. One of the most valuable service

offered by JBoss is an implementation of EJB 3.0 specification [5]. In particular, an EJB container

Page 48: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

42 Chapter 3: SR Life Cycle Management

is available that takes care of enabling the use of Entity Beans and Session Beans. The Data and

Data Management layers on the bottom of the architecture have been designed to fit the EJB

support. The boxes into Data layer are for grouping the entities managed by the SR Manager.

More details are provided in next section. The Data Management layer contains several Stateless

Session Beans that enable the access to persistent data managed by the layer below.

On the top of the architecture, two layers are placed. Both exploit other technologies supported

by JBoss AS, that are

� web service [18]

� publish/subscribe topic [8]

� servlet and JSP [16]

In the Communication & Presentation layer there’s a sub-component for each of the interfaces

exposed. These have been already described in the previous section. Moreover, it includes all the

software modules relative to dynamic web page creation. The side-effect operations the web GUI

provides are mediated by several servlets that are grouped on the basis of the macro-functionality

they refer to.

The middle layer of Business Logic serves as a glue for merging together the core functionalities

supported by the EJB container and the external world that interacts with SR Manager by the

mean of web services, topics and web pages.

Besides the horizontal division by layers, a vertical one is also proposed, that highlights the mapping

between modules and functionalities.

3.3.2 Data Model

The data managed by SR Manager are placed in the most bottom layer. A more formal description

of these data can be given by figure 3.9. It’s evident that an Account may be granted more Roles,

and that a Role may be granted to several different Accounts. There actually are some higher-level

constraints about this concern that cannot be easily captured in UML [17]. For example, the same

account cannot be given both Business Manager and Technical Administrator roles, otherwise SoD

would become unfeasible.

It’s worth noticing that a Partner can have a twofold relationship with an SR. In fact a Partner can

be either the Administrator of an SR and/or join it as o member of a client. The latter relationship

is actually a class relationship, since it relates to a set of Resources for specifying which are the

physical hosts the Partner wants to employ in such join. In turn, each Resource is linked to the

set of Software installed on it. Only the Software included in the SR can be downloaded and

installed on these Resources. Also this constraint is not captured in the diagram but is enforced by

the business logic that works on the top of EntityBean layer. The same reasoning applies for the

constraint that the Software of an SR has to be the same of the Software linked to the Deployment

chosen for such SR.

Page 49: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.4. IMPLEMENTATION 43

Figure 3.9: SR Data Model

SR and SR Schema both have relationships with PenaltyTemplate and SLSTemplate. The latter

represents a metric that the system is capable to monitor. The former denotes an action the SR

Manager is expected to execute. Then, the SLSSchema is another class relationship which models

that a certain SLS can be included in an SR and whether it is mandatory. Instead, the SLS is for

expressing that a particular SLS has been really put into the contract of an instantiated SR with

a specific threshold to check.

3.4 Implementation

This section gives some information about used technologies and includes a short how-to about the

usage of the web interface.

3.4.1 Technologies

The usage of JBoss Application Server [9] has been already motivated in section 3.3.1. Precisely,

version 5.1 has been employed. For data persistence, RDBMS MySQL version 5.1 [11] has been

chosen. The target OS where an instance of SR Manager has been deployed is CentOS 5.4 [2], a

community-supported, mainly free software operating system based on Red Hat Enterprise Linux

[14]. As development environment, Eclipse Galileo (version 3.5) [4] has been extensively used.

The choice of all the aforementioned tools has been driven by the need of using open source software.

Page 50: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

44 Chapter 3: SR Life Cycle Management

3.4.2 HowTo

Apart from being a complete user guide, this subsection reports some notices and screenshots about

the following functionalities accessible by the web interface

� registration

� login

� SR Schema creation

� SR instantiation

� SR join

Registration

Figure 3.10 shows the form for the registration. You have to fill in all the required fields, read

and acknowledge Terms of Service and Privacy Policy and then press Confirm registration button.

Then you have to wait that the CoMiFin Authority checks and accepts your registration, so that

you can login into the system using the credentials you’ve specified before. The next step is the

definition of the Technical Administrator, as already explained in section 3.2.

Login

Figure 3.11 displays the login form. Simply type username and password you’ve provided at

registration time.

SR Schema Creation

Only a user with Creator role can access this functionality, as explained before in section 3.2.3.

Figure 3.12 draws the step of the wizard for defining the SLSs, that is choosing which SLS to include

in future derived SR instances. In figure 3.13, the step for the specification of the deployments that

will be available for the instances is shown.

SR Instantiation

In order to instantiate a new SR, a user with Business Manager role is needed, as stated in section

3.2.4. You have to browse available SR Schemas and choose the one you want to instantiate. A

wizard is then started, with the following steps

� refine objective and description previously defined in the Schema

� choose which SLSs include and their values

� finally, decide the deployment to adopt

Page 51: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.4. IMPLEMENTATION 45

Figure 3.10: Registration

SR Join

This is a complex functionality, already described in section 3.2.4. Required steps are detailed

immediately below. Let X be the SR to join, P the partner that wants to join X, and A the

administrator of X.

� begin the join

Login as Business Manager of P. Click on Semantic Rooms on the left menu. On Instantiated

Semantic Rooms page, click on details on the row of X. On Semantic Room details page,

press Begin Join button and confirm your choice.

� configure resources

Login as Technical Administrator of P. Click on Semantic Rooms on the left menu. On

Joined Semantic Rooms page, click on details on the row of X. On Semantic Room details

page, press Configure Join button. On Join configuration page, insert requested data then

press Ok button and confirm your choice.

Page 52: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

46 Chapter 3: SR Life Cycle Management

Figure 3.11: Login

� accept join

Login as Business Manager of A. Click Semantic Rooms on the left menu. On Instantiated

Semantic Rooms page, click on details on the row of X. On Semantic Room details page,

press View membership button. Press Accept Join button on the row of P and confirm your

choice.

� download software and complete the join

Login as Technical Administrator of P. Click Semantic Rooms on the left menu. On Joined

Semantic Rooms page, click on details on the row of X. On Semantic Room details page,

press Download Software button. On Downloadable Softwares page, click download on the

interested rows and save the files somewhere. After having installed such software, return on

Semantic Room details page, press Complete Join button and confirm your choice.

Page 53: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

3.4. IMPLEMENTATION 47

Figure 3.12: SR Schema creation - SLSs definition

Page 54: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

48 Chapter 3: SR Life Cycle Management

Figure 3.13: SR Schema creation - Deployments definition

Page 55: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Chapter 4

An SR for Intrusion Detection

One of the SR implemented so far is for carrying out Intrusion Detection (ID) [20], based on the

analysis of stealthy port scanning activities. After a brief explanation about how ID could be

enhanced with a collaborative approach, the Agilis system is introduced. The required processing

steps performed by a dedicated SR are then presented and the main innovations brought by Agilis

are described. This chapter ends with the description of work about performance monitoring.

4.1 Collaborative Intrusion Detection

The subjects of this kind of attack are the web servers handling the external web connectivity of

the participating FIs. Those web servers typically run outside the corporate firewall (in DMZ), and

are therefore frequently targeted by the attackers. The goal of the attack is to identify TCP ports

that might have been left opened at the attacked subjects. The attack is carried out by initiating a

series of TCP connections to ranges of ports at each of the targeted DMZ servers. The ports that

are detected as opened can be used as the intrusion vectors at a later time.

The attack detection is based on identifying patterns of unusually high number of TCP SYN

requests possibly targeting an unusually high number of ports, and originating from the same

external IP address. Such a detection relies on some thresholds, that set the maximum number

of requests which can be issued before being classified as a malicious source. The problem that

arises concerns that these requests are spread by the attacker on several target hosts so that those

threshold are never exceeded. Here the collaboration comes into play. In fact the statistics are

collected and analyzed across the entire set of the ID-SR participants, thus improving chances of

identifying low volume activities, which would have gone undetected if the individual participants

were exclusively relying on their local protection systems. In addition, to minimize the amount

of false positives, the real-time suspicions are periodically calibrated through a reputation system

which maintains the site ranking based on the past history of the malicious activities originating

from those sites.

49

Page 56: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

50 Chapter 4: An SR for Intrusion Detection

4.2 The Agilis System

To support required data analysis, a distributed event processing system has been implemented,

called Agilis. It consists of a distributed network of processing and storage elements hosted on

a cluster of machines allocated from the ID-SR hardware pool. The processing is based on the

Hadoop’s MapReduce framework [1] [10]. The processing logic is specified in a high-level language,

called Jaql [7], which compiles into a series of MapReduce jobs. To improve detection latency, the

input data is gathered through buffers stored in an in-memory distributed storage system, called

WebSphere eXtreme Scale (WXS) [19]. The individual components of the Agilis’ framework are

illustrated in Figures 4.1 and 4.2, and described in detail below.

Figure 4.1: MapReduce-based Semantic Room

4.2.1 WebSphere eXtreme Scale

WebSphere eXtreme Scale (WXS) [19] is a distributed in-memory database implemented in Java.

It allows the user data to be organized into a collection of maps consisting of either relational

records, or key-value pairs. At runtime, the data are stored in Data Servers or containers hosted

on a cluster of machines. The clients can query the stored data using either a simple get/set API,

or full-blown SQL queries. The queries can be executed either on the client, or within a container

using an embedded SQL engine.

For scalability, the map’s data can be broken into a fixed number of partitions, which would

then be evenly distributed among the WXS containers by the WXS runtime. In addition, for

Page 57: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

4.2. THE AGILIS SYSTEM 51

Figure 4.2: The Components of the Agilis Runtime

fault tolerance, each map partition can be replicated on a configured number of containers. The

information about the operational containers as well as the layout of hosted map partitions and

their replicas is maintained at runtime in the WXS Catalog service, which is typically replicated

for high availability.

4.2.2 Hadoop and MapReduce

MapReduce is a programming model introduced by Google for processing and generating large data

sets. Users specify a map function that processes a key/value pair to generate a set of intermediate

key/value pairs, and a reduce function that merges all intermediate values associated with the same

intermediate key. Many real world tasks are expressible in this model.

Programs written in this functional style are automatically parallelized and executed on a large

cluster of commodity machines. The run-time system takes care of the details of partitioning the

input data, scheduling the program’s execution across a set of machines, handling machine failures,

and managing the required inter-machine communication. This allows programmers without any

experience with parallel and distributed systems to easily utilize the resources of a large distributed

system.

Apache Hadoop is a software framework that supports data-intensive distributed applications under

a free license. It enables applications to work with thousands of nodes and petabytes of data.

Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers. Hadoop is

a top-level Apache project being built and used by a global community of contributors, using the

Java programming language. Yahoo! has been the largest contributor to the project, and uses

Hadoop extensively across its businesses.

Page 58: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

52 Chapter 4: An SR for Intrusion Detection

The usual interaction with external world consists in the user submitting a MapReduce job which

defines both map and reduce steps. The system is then responsible of splitting the job in a set of

concurrent map tasks and reduce tasks that are allocated to available processing nodes.

4.2.3 Processing Framework

The processing is carried out on the machines within the ID-SR cluster, and orchestrated through

the optimized Hadoop scheduling framework. The latter consists of a centralized Job Tracker (JT)

which coordinates the local execution of mappers and reducers on each of the ID-SR nodes through

a collection of Task Trackers (TT) (one per machine).

Most of the scheduling optimizations were targeted at improving locality of processing by schedul-

ing the map tasks close to the WXS partitions holding their respective input splits. To match

the input splits with the WXS partitions, a new implementation of the Hadoop’s InputFormat

interface has been provided, which was packaged with every Agilis’ MapReduce job submitted to

JobTracker. Subsequently, the getSplits method of this interface was used by JT to determine the

split locations at runtime (which was obtained by interrogating the WXS Catalog service); and

the createRecordReader method to create an instance of RecordReader to read the data from the

corresponding WXS partition. To further improve locality, the implementation of RecordReader

recognized the SQL select, project, and aggregate queries (by interacting with the Jaql interpreter),

and delegated their execution to the SQL engine embedded into the WXS container.

In many cases, this approach resulted in a substantial reduction in the volumes of intermediate

data reaching the reducers thus improving latency, bandwidth utilization, and reducing processing

costs. It also allowed to further enforce privacy of the input data submitted by the individual

ID-SR members by scheduling the initial map processing on the machines residing within their

administrative boundaries.

4.2.4 Long-Term Data Storage

Hadoop Distributed File System (HDFS ) [6] is used to provide storage services for massive amount

of data that should be preserved over time (such as e.g., historical data keeping track of past

attacks). The data stored in HDFS can be injected into Hadoop through the provided HDFS

InputFormat implementation, and combined with the WXS data using the Jaql I/O constructs.

HDFS is managed and kept consistent by the Hadoop’s Chunk Manager (CM), and Zookeeper (ZK)

services.

4.2.5 The Jaql Language

The processing logic is expressed in a high-level language, called Jaql [7]. It supports SQL-like

query constructs that can be combined into flows. It can also interact with a large variety of data

sources due to its use of the standardized JSON data model. As shown in Figure 4.2, in Agilis, the

locally compiled Jaql flows are first augmented with the input and output formats to interact with

Page 59: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

4.3. THE ID-SR PROCESSING STEPS 53

WXS, and then submitted to the modified Hadoop scheduler, which orchestrates their execution

on the ID=SR machines as explained above.

4.3 The ID-SR Processing Steps

The processing steps followed by the ID-SR implementation are depicted in Figure 4.3. At the

fist step, the raw data capturing the current networking activity at each of the participating ma-

chines is collected using the tcpdump utility and forwarded to the local ID-SR gateway. Each

gateway will then normalize the incoming raw data producing a stream of LogEvent records of the

form: <SOURCEIP, DESTINATIONIP, SOURCEPORT, DESTINATIONPORT, BYTESSENT,

BYTESRECEIVED, RETURNSTATUS>. The LogEvent records are stored in a WXS partition

hosted on a locally deployed WXS container. The incoming LogEvent records are then processed

Figure 4.3: Data flow for Port Scan Detection in ID-SR

by a collection of MapReduce jobs handled by Agilis. The processing logic consists of the follow-

ing steps: First, the input records are subjected to the Summarization flow which consists of two

processing steps surrounded by two I/O steps (for reading the input, and writing the results). The

outcome of the two processing steps is a collection of summary records of the form <SOURCEIP,

PORTSNUM, REQNUM>representing for each source IP address (SOURCEIP) the number of

distinct ports (PORTSNUM) accessed from SOURCEIP along with the total number of requests

(REQNUM) originating from SOURCEIP.

The summary records are then fed into the Blacklisting flow (see Figure 4.4), which will blacklist a

source IP address if the number of requests and distinct ports accessed from that IP address exceed

Page 60: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

54 Chapter 4: An SR for Intrusion Detection

fixed limits. In addition, the summary records are also joined with the historical records of the

form <SOURCEIP, RANK>using SOURCEIP as a key to adjust the long-term rank representing

the IP address threat level. The historical records are used to periodically calibrate the blacklist

by excluding the IP addresses whose ranks fall below a fixed threshold.

Figure 4.4: Jaql query fragments used for Port Scan Detection in ID-SR

4.4 Main Innovations

Agilis brings some innovations respect to what is the actual use of the frameworks it includes.

Firstly, Hadoop has been designed to simply support distributed computation, in fact the scenario

where it is usually involved presents a user that submits a MapReduce job specifying where input

data are placed. Agilis employs Hadoop as computing engine for distributed event processing, where

the beginning of the computation is activated by the arrival of new events instead of by the explicit

action of a human user.

The other aspect, that really depends on the previous one, concerns the attempt to shift the timely

properties provided by Hadoop. It is developed to fit well for batch processing. Here the goal is to

make it suitable for near-real-time processing, so that possible attack patterns can be recognized

early enough to allow proper countermeasures to be undertaken.

4.5 Performance Monitoring

Part of my contribution to the processing engine has consisted in designing and implementing

a simple system, called HadoopMeter, for monitoring the performance of an Agilis deployment.

Page 61: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

4.5. PERFORMANCE MONITORING 55

PlanetLab [13] nodes have been used to set up the cluster. PlanetLab is a global research network

that supports the development of new network services.

The main need is to flexibly provide the following information about the performance provided by

Agilis

� response time

the average time that takes to complete a job

� throughput

the number of jobs completed in an hour

These information can be inferred from the log files produced by the JobTracker. In fact such logs

include both the time of submission and of completion of each executed job.

It has been chosen to move the computation away from the node where the JobTracker is running,

in order to avoid further load in a node that already is a bottleneck, since the JobTracker actually is

the centralized scheduler for all the map and reduce tasks. For this reason, the logs are first quickly

scanned to extract submission and completion times of just completed jobs, then these information

are sent to another node where the rest of the processing is carried out. An end user can access

these statistics through a web site that, given a time window, displays average response time and

throughput of the jobs completed within such a range. Figure 4.5 shows a sketch of HadoopMeter

architecture. HadoopMeterSender component is in charge of reading fresh logs to extract required

information about newly completed jobs and send them to HadoopMeterReceiver component, which

in turn saves them to a storage module called HadoopMeterDB. HadoopMeterGUI enables end users

to access and query stored information through a web interface.

Page 62: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

56 Chapter 4: An SR for Intrusion Detection

Figure 4.5: HadoopMeter architecture

Page 63: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Chapter 5

Conclusions

This chapter concludes summarizing the whole work and giving some hints about possible future

directions of research and application of what has been described so far.

5.1 Summary

This work can be seen as a top-down survey that begins from the advantages and the opportunities

of collaborative environments, then gets through a practical application to Financial context within

CoMiFin European project and finally forks in two different directions

� design of SR Manager component

� analysis of Agilis system

In this section, such survey is briefly retraced highlighting the main aspects the have been touched.

More and more operations nowadays are carried out over the Internet, with a huge amount of

critical data and high-value information being on move over the net. The consequence of this trend

is making the Internet become more and more a Critical Infrastructure (CI). But unlike other CIs,

this one is openly accessible by everyone all over the world, and that’s a very dangerous risk. In

the context of critical infrastructure protection, the need arises to properly protect the Internet

from the attacks that are getting increasingly frequent, complex and very hard to face. CoMiFin

European project addresses this need focusing on the protection of Financial Institutions (FIs) and

of all the critical infrastructures that make FIs work.

The collaboration among financial actors can help in this direction. The correlation of traffic data

provided by several organizations is a very appealing way for coping with internet attacks. Two

threats are mainly tackled by CoMiFin

� Distributed Denial of Service (DDoS)

� Man in the Middle (MitM)

57

Page 64: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

58 Chapter 5: Conclusions

The Semantic Room abstraction is an interesting mean for making FIs cooperate sharing data and

resources with the aim of getting back useful information and advices for their own protection.

CoMiFin is actually aimed at implementing such abstraction, providing all the required manage-

ment functionalities and facing all the issues related to privacy and trustworthiness in a financial

context. In fact, strong guarantees are needed to convince a FI to move its sensitive data through

the internet and to other FIs. Moreover, in order to enhance the security of CoMiFin system itself,

the principles of Segregation of Duties (SoD) are applied. A very important role in this project is

played by the actual processing engine. Due to the huge amount of data to process, to the com-

plexity of detection algorithms to implement and to the timeliness requirements to meet, a flexible

distributed event-driven computing paradigm is required.

These requisites have driven the design of an adaptable architecture that supports the Semantic

Room abstraction. In such architecture, two principal layers can be identified: the SR management

layer which is responsible for the management of the SRs, and the Complex Event Processing and

Applications layer which realizes the SR processing and sharing logic. In addition, all the archi-

tectural components of both layers above can utilize various commodity services for monitoring

activities, resource management and storage services.

Most of my work has concerned with the SR Manager component, chiefly in charge of supporting

the life cycle of SRs. It lets users define the contract of an SR in order to regulate the behavior

of its participants and gives the possibility to create SR Schemas, a sort of templates for having

similar SRs. In order to improve the security of this component, users and related roles have been

conceived taking in mind the basics of SoD.

I’ve spent a little effort in the analysis of an SR for Intrusion Detection based on the recognition

of stealthy port scanning. It employs IBM Agilis as event processing engine. A simple monitoring

system has been implemented to maintain useful statistics about Agilis performance.

5.2 Future Directions

Although the scope of the work has been precisely defined, the previous survey has left space to

other topics that could turn to be quite interesting to focus on. This section tries to give some

hints about such topics.

5.2.1 Other Scenarios

The Semantic Room abstraction supported by the proposed architecture is general and could be

applied to scenarios other that the Financial one. In fact it is really a mean for enabling collab-

orative environment, so whenever a cooperation among networked entities is suitable, an SR-like

approach can be feasible.

An interesting example scenario is the mobility. It concerns the calculation of the fastest or the

shortest path to reach a certain destination, that is becoming an everyday action. This computa-

tion is simply based on roads topology and doesn’t consider the current situation of roads, that can

Page 65: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

5.2. FUTURE DIRECTIONS 59

sensibly influence the time needed for covering them. But if we put together sensors for detecting

traffic conditions, other sources able to provide real-time info about roads state, for example acci-

dents, and also the existing topology, we probably provide a better input for computing the best

path to the desired destination. Connecting these sources and correlate the data they provide are

operations that completely fit the functionalities offered by an SR.

5.2.2 Botnet Detection

This area concerns an important improvement to the Intrusion Detection algorithm executed by

Agilis. Currently, it produces a list of IP addresses suspected to be intruders. It would be very

interesting and useful to make a step further and try to recognize a whole botnet instead of single

IP addresses.

A botnet [22] is a number of Internet computers that, although their owners are unaware of it,

have been set up to forward transmissions (including spam or viruses) to other computers on the

Internet. Any such computer is referred to as a zombie. Most computers compromised in this

way are home-based. Often, botnets comprising hundreds of thousands computers are rented by

criminal organizations to execute a DDoS attack aimed at undermining the reputation of the target

or at extorting money. This example gives an idea about the practical value of identifying entire

botnets.

5.2.3 Resource Allocation and Scheduling Optimization

The employment of cloud computing as dynamic resource provider is very appealing for what con-

cerns collaborative environments. The ability of allocating resources to users so as to maximize

hardware utilization and react to sudden changes in the load, fulfilling at the same time many

requirements about fairness among served users and response time constraints, is currently an on-

going research area that can be further investigated in these new scenarios enabled by collaborative

environments.

Moreover, the way the various tasks that form a distributed computation are scheduled to available

processing nodes is a key element for the performances of the whole parallel computing framework.

Within CoMiFin context, we’ve seen the JobTracker component (section 4.5), that performs the

allocation of map and reduce tasks to TaskTrackers with available slots. We’ve already pointed out

that it constitutes a bottleneck for Hadoop performances and a single-point-of-failure for the entire

system. Redesigning the JobTracker as a distributed component can overcome this limitation and

generally improve the quality of the entire architecture.

5.2.4 Privacy and Trustworthiness

A major obstacle to the practical usage of collaboration within financial context is the lack of proper

guarantees about data privacy. Information managed by financial actors are very sensitive and high-

value, and any unexpected disclosure could cause huge economic losses and high undermining of

Page 66: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

60 Chapter 5: Conclusions

reputation. This risk prevents financial actors from sharing their data, cutting off what actually

would make the collaboration work well.

There are mainly two directions for addressing this issue. From one part, better technologies

for anonymizing data could be investigated, in order to encourage potential participant to share

their data, assuring them that the confidentiality of their information can be completely preserved.

Another way is increasing the level of trustworthiness between all the participant, so that they

become more confident each other and inclined to provide their data.

Page 67: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

Bibliography

[1] Apache Hadoop. http://hadoop.apache.org.

[2] CentOS – Community ENTerprise Operating System. http://www.centos.org.

[3] CoMiFin (Communication Middleware for Monitoring Financial Critical Infrastructure). http://www.

comifin.eu/.

[4] Eclipse Galileo. http://www.eclipse.org/galileo.

[5] Enterprise Java Bean 3.0. http://java.sun.com/products/ejb/docs.html.

[6] Hadoop Distributed File System. http://hadoop.apache.org/hdfs.

[7] Jaql - a query language designed for Javascript Object Notation (JSON). http://code.google.com/

p/jaql.

[8] Java Message Service. http://java.sun.com/products/jms.

[9] JBoss Application Server. http://www.jboss.org/jbossas.

[10] MapReduce: Simplified Data Processing on Large Clusters. http://labs.google.com/papers/

mapreduce.html.

[11] MySQL - The world’s most popular open source database. http://www.mysql.com.

[12] Nagios - The Industry Standard In Open Source Monitoring. http://www.nagios.org.

[13] PlanetLab - an open platform for developing, deploying and accessing planetary-scale services. http:

//www.planet-lab.org.

[14] Red Hat Enterprise Linux - The world’s leading open source application platform. http://www.redhat.

com/rhel.

[15] Segregation of Duties. http://en.wikipedia.org/wiki/Separation_of_duties.

[16] Servlet and Java Server Pages. http://java.sun.com/products/servlet.

[17] Unified Modeling Language. http://www.uml.org.

[18] Web Services. http://www.w3.org/TR/ws-arch/.

[19] WebSphere eXtreme Scale - An essential for elastic scalability and next-generation cloud environments.

http://www.ibm.com/software/webservers/appserv/extremescale.

[20] Chenfeng Vincent Zhou, Christopher Leckie, Shanika Karunasekera. A survey of coordinated attacks

and collaborative intrusion detection. Computers & Security, 29:124–140, 2010.

61

Page 68: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

62 BIBLIOGRAPHY

[21] Donald Smith. D-Link router based worm? available online at http://isc.sans.org/diary.html?

storyid=4175.

[22] Evan Cooke, Farnam Jahanian, Danny McPherson. The Zombie Roundup: Understanding, Detecting,

and Disrupting Botnets, 2005. Proc. of USENIX Workshop on Step to reducing unwanted traffic on the

internet (SRUTI’05), Boston, 2005.

[23] Federal Trade Commission. Consumer Fraud and Identity Theft Complaint Data January - December

2007. available online at http://www.ftc.gov/opa/2008/02/fraud.pdf, February 2008.

[24] Ilett Dan. US to force firms to ’fess up on data loss. available online at http://software.silicon.

com/security/0,39024655,39157787,00.htm, April 2006. Security Strategy.

[25] Kim Davies. 2008 DNS cache poisoning vulnerability. available online at http://www.iana.org/about/

presentations/davies-cairo-vulnerability-081103.pdf.

[26] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski,

Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica and Matei Zaharia. A View of Cloud Computing.

CommunicationS of the ACM, 53:50–58, April 2010.

[27] S. Cuganesan, D. Lacey. Identity fraud in Australia: An evaluation of its nature, cost and extent.

Standards Australia International Ltd. Sydney, 2003.

[28] Susan Orr. DDoS Threatens Financial Institutions - Get Prepared! available online at http://www.

orrandorrconsulting.com/articles/DDoS-Threatens-Financial-Institutions.pdf, 2005.

[29] Tim Wilson. For Sale: Phishing Kit. available online at http://www.darkreading.com/security/

management/showArticle.jhtml?articleID=208804288.

[30] UK Home Office. Updated estimate of the cost of identity fraud to the UK economy. available online

at http://www.identity-theft.org.uk/IDfraudtable.pdf, February 2006.

Page 69: La Sapienza - uniroma1.itmidlab.diag.uniroma1.it/articoli/Master Thesis - Leonardo Aniello.pdf · new trends towards the "webi cation" of nancial services such as home banking, online

List of Figures

1.1 Structure of business entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 CoMiFin in a Financial Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 SR data hadnling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 SR Members and Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 The framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 SR Manager macro-functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Partner registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 SR instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 SR join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 SR leave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.6 SR disband . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.7 Component diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.8 SR Manager architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.9 SR Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.10 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.11 Login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.12 SR Schema creation - SLSs definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.13 SR Schema creation - Deployments definition . . . . . . . . . . . . . . . . . . . . . . 48

4.1 MapReduce-based Semantic Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 The Components of the Agilis Runtime . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.3 Data flow for Port Scan Detection in ID-SR . . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Jaql query fragments used for Port Scan Detection in ID-SR . . . . . . . . . . . . . . 54

4.5 HadoopMeter architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

63