Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SERVICE AND RESOURCE DISCOVERY SUPPORTS OVER P2P OVERLAYS
EMANUELE CARLINI, MASSIMO COPPOLA,DOMENICO LAFORENZA,
PATRIZIO DAZZI, LAURA RICCI
International Conference on Ultra Modern Telecommunications, ICUMT
Saint Petersburg, October 12-14th, 2009
Università degli Studi di Pisa Dipartimento di Informatica
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
INTRODUCTION
• Grid environments exploit a huge amount of geographically scattered computing resources
• Main features of large computational grids
– Dynamic environment
– Huge amount of heterogeneous resources
– Complex middlewares for accessing the resources
• XtreemOS: a research project funded by the European Commission
– main goal: definition of an Open Source, Grid enabled Operating System
– scalable and transparent management of large computational platforms
– federation of several virtual organizations
– users exploit the distributed system through a standard operating system interface
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SRDS: SERVICE AND RESOURCE
DISCOVERY• SRDS: a basic service of XtreemOS providing a highly distributed directory service
• SRDS main features
– enables resource look-up and exploitation in a multi-VO environment
– hides the effect of scale when exploiting individual systems
– may be exploited by different clients
• other modules of XtreemOS
• applications
– supports different kind of queries
• key-based
• multi-attribute
• range queries over dynamic attributes
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SRDS ARCHITECTURE
• SRDS exploits a set of P2P overlays where each overlay includes nodes from different virtual organizations
• The choice of the P2P model enables
– scalability
– low overhead
– fault tolerance
– management of information in a dynamic environment
• SRDS services are exploited by different clients, each one with different requirements.
– to cope with the diversity of these requirements, several P2P overlays characterized by different features have been defined (Distributed Hash Tables, structured overlays,...)
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SRDS: THE ARCHITECTURE
Facade:
an easy-to-extend multiple interface protocols
Query Provider (QP):
set of modules for client query translation
Information Management Layer(IML):
common interface to DHT-like overlays
ADS(Application Directory Service) =
Facade+ QP + IML
RSS Resource Selection Service
a P2P overlayallowing scalable resource
location in large overlays
Scalaris , Overlay Weaver:
DHT with different characteristics
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SRDS MAIN MODULES: ADS AND RSS
RSS (Resource Selection Service) supports resource discovery through queries on constant value attributes
CPU = IA32, MEM 2[4GB;), BANDWIDTH [512Kb=s;), DISK [128GB;),OS fLinux 2.6.19-1.2895, . . . , Linux 2.6.20-1.2944}
ADS (Application Directory Service) supports complex queries over dynamic attributes
Example:
the RSS selects a set of resources matching whose static attributes match the query constraints.
the descriptors of these resources are stored in the ADS.
the dynamic state of the resources (for instance, current free memory) is monitored through the ADS
RSS acts a machete, while ADS acts like a 'bistury'
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
RSS: RESOURCE SELECTION SERVICE
• Supports resource discovery through multi attribute range queries over a set of static attributes, i.e.constant-valued attributes, known at inizialization time.
• RSS main features
– each node represents its own attributes in the overlay
– no delegation of the resource information to other nodes, like in DHT-based approaches
– speed up resource location
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
ADS: THE QUERY PROVIDER (QP)
• Query Provider Layer: provides a set of modules devoted to query translation
• Implements a set of algorithms for the interpretation of the queries of different SRDS clients
• For instance, a job directory service is required to monitor the state of the jobs of an application/VO
– when a new job is created, the client submits an AddJob to the SRDS
– the AddJob operation is interpreted by a QP modules which translates it into a sequence of operations on the underlying DHT
• Check of the existence of a proper job directory service, if it does not exist, it requires its creation
• Insertion of the job ID into the DHT
• Insertion further information about the jobs under proper keys to suppor inverse queries
– The QP makes all these steps transparent for the user
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
ADS: THE INFORMATION MANAGEMENT LAYER (IML)
Namespaces defines the context where the key is used. For instance different name space for different job directories ADS (Application Directory Service)
– provides an implementation of namespaces over DHT
– receives from a QP module an abstract operation:
OPQP = { op, keyM, valueM, NSpace, ClientType, ClientID }
– provides an implementation of namespaces
– generates an operation for the underlying DHT in the proper namespace
OPDHT ={ op, keyD, valueD, auxinfo }
where valueD: generally equals valueM
keyD: may differ from keyM because of namespace implementation
auxinfo: data expiration timeouts, user-defined secrets,....
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
EXPLOITING NAMESPACES: AN EXAMPLE
• Network coordinates (NC) embedding system embed latency such as round trip times among nodes into some geometric space
• Each node is assigned network coordinates in the geometric space
• Unmisured round trip times is estimated by computing the distance between two nodes in the geometric space
To support
• direct queries, i.e. given the IP of the nodes return its network coordinates
• inverse queries given the X/Y coordinate of the node, find the the IP of the 'nearest' neighbours'
the ADS
• exploits three different namespaces: IP, X, Y
• each namespace may be mapped on a different DHT or on the same DHT and may have different characteristics
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
NAMESPACE IMPLEMENTATION
Different choices for the implementation of the namepsaces:
a different DHT for each namespace a set of namespaces on the same DHT
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
NAMESPACE IMPLEMENTATION
Single Ring Approach:
• DHT key is prefixed by the an identifier of the name space
• main drawback: DHT features, like replication strategy, fault repair strategy,... cannot be tuned according to the name space
Multiple Ring Approach
• On demand ring creation
• Parameters and policies of the DHT ring are customized at ring set-up time
• Some rings may always remain active
– include essential key space, for instance resource directories
• Smaller rings may have a shorter lifespan
– application rings, for instance job directory for a given application,....
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
NAMESPACE IMPLEMENTATION
• The Current version of the ADS exploits two different rings, based on two different DHT, Scalaris, Overlay Weaver
• Scalaris
– A transactional based DHT
– Provides consistent replication of data
• Overlay Weaver
– implements different DHT
Chord, Pastry, CAN,...
– define a routing layer common to
all the DHTs.
The Overlay Weaver Architecture
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
COMPLEX QUERIES ON DHT
DHT supports only basic key-value queries
More complex queries may be submitted by the SRDS clients
Multidimensional range queries on dynamic attributes
Examples
exact match query: Arch.='x86' and CPU-Speed='3 Ghz' and RAM='256MB'
partial match queries: CPU-Speed='3 Ghz' and RAM='256MB' (and Arch.=*)
range queries 1Ghz<CPU-Speed<'3Ghz' and 512MB<RAM<1Gb
similarity queries (o nearest neighbour queries) require the definition of a metric in the attribute space the user submits an exact match query, which defines a point P in the
attribute space. P may not correspond to any resource. output: k resources nearest to P, according to the defined metric
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
RANGE QUERY SUPPORT
– an approach based on the MAAN proposal
– exploits the Chord DHT
Resource pubblication
– Each resource is described by k pairs (ai, vi)
– A locality preserving hashing function maps the
value of each attribute onto the DHT
H(vi) = (vi - vimin) x (2m -1) / (vimax – vimin)
2m : dimension of the key space
The descriptor of each resource is published
onto k DHT nodes
SRDS supports multiattribute range queries
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
RANGE QUERY SUPPORT
• Consider a multi attribute range query a1[v1l, v1u], ...ak [vkl,...vku]
• The hashing function maps the range of each attribute onto a DHT range
• Selectivity of an attribute
Si = 2m/ H(viu) – H(vil)
• The dominant attribute ai= [vil,..viu] with the highest selectivity is choosen.
• The query is sent to H(vil) and is propagated on a DHT arc A till it reaches
H(viu)
• Each node on the A checks if the query satisfies all the query constraints
• The results are collected along A and sent by the H(viu) to the querying node
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
PUBLICATION OPTIMIZATION
• SRDS optimizes the publication process of the resources defined by MAAN
• Publication optimization: exploits soft state cache to store the routing results obtained during the publication process
• Routing on the DHT is avoided if the routing path to a node is stored in the cache
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
PUBLICATION OPTIMIZATION
• A second optimization is defined to avoid the publication of 'unpopular' attributes
• Popularity of an attribute A = number of times A is chosen as dominant in a query
– depends on the query distribution
• Descriptors associated with low popularity attributes are updated with lower frequency
• Popularity is
– dinamically refined in a distributed fashion by the nodes receiving the queries
– estimated at target nodes receiving the query and sent back to publishing nodes by put-reply messages
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
SRDS EVALUATION
• testing environment: Grid 5000 Platform, nodes belons to different Grid 5000 clusters• all nodes publish information every 30s• a large fraction of nodes run queries every 100 ms.
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
JOB DIRECTORY SERVICE EVALUATION
• 20-120 nodes belonging to two clusters of the Grid 5K platform each node performs publications over the DHT at fixed 30 seconds rate time interval between different requests 200 milliseconds• Latency of different operations are measured
•AddJob requires a set of put/get operations•RequestJob: a single DHT get
Sericce and Resource Discovery Supportsover P2P Overlays
Emanuele Carlini, Massimo CoppolaPatrizio Dazzi, Domenico Laforenza, Laura Ricci
CONCLUSIONS
• SRDS: a service and resourse discovery support developed for the XtreemOs distributed operating system
• Provides scalable and customisable information query support over large platforms
• Future works:
– testing SRDS on a large computing platform
– dynamic definition of namespaces on different DHTs
– definition of hierarchical name spaces
– investigation of further strategies for range queries (multi attribute range and neighbours query)