management and usage of large scale...
TRANSCRIPT
Management and usage of large scale infrastructures
2
Grid Computing and clouds
● Ian Foster on Grids : “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”.
● Clouds: sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network
Should be easy to use and could be easy to manage
3
User point of view
4
Several goals
● Low latency– How long do I have to wait my job
completion
● High Throughput– How many jobs can I finish in a timeframe
● Low cost– How much does it cost to me
● Low complexity– How long do I have to manage my jobs
5
Reduce the complexity
● Standardization bodies● Use of open protocols● High level QoS● Abstraction levels
– Grid● Everything is a resource
– Cloud● PaaS, IaaS, SaaS
Services !!!
6
Frontends
● First contact– Web Site
● Dedicated
– Eclipse Plugin● Developper
only
– Command line● Expert
only
7
Why doing it simple ?
● Framework for building submission websites
8
Cloud version : same complexity
9
Simple Job Submission
● Submit job to a GRAM servicedefault factory EPRgenerate job RSL to default localhost
● Command example:% globusrun-ws -submit -c /bin/touch touched_itSubmitting job...Done.Job ID: uuid:002a6ab8-6036-11d9-bae6-0002a5ad41e5Termination time: 01/07/2005 22:55 GMTCurrent job state: ActiveCurrent job state: CleanUpCurrent job state: DoneDestroying job...Done.
10
Simple Job Submission : Cloud version
11
Security is complex
● In clouds– Isolation and no sharing
– Delegated to other layers
● In grids– Virtual organization
– Cooperation between sites
– Trust mechanisms
12
Grid Security Infrastructure (GSI)
● Based on certificates● Several CA (Certificate
Authorities)● Trust relations are inherited from
CA● Communications are based on
SSL● Coarse grained
– Not adapted for reading few bytes in a file
13
Grid Security Infrastructure (GSI)
14
Timing and methodology
● Clouds– Everything by hand, you have what you
pay● PaaS / SaaS / IaaS
– Deployment/Development depends on what you buy
● Grids– Standardized (everything is a resource)
– Can do everything so everything is a pain
15
Example of Grid data communication
● Globus WSRF : Web Service Resource Framework
● Data accessis a service
16
Provider point of view
17
Job flow in grids : Question ? How many decisions
18
Basic useful services
● VO Management Service: resources allocation to each Virtual Organization.
● Resource Discovery and Management Service● Job Management Service● And much more: security (authentication,
authorisation, data management)…
● All all services interact: example Job Management Service needs Resource Discovery
● Need Standardization for interfaces to services Example: JobSubmissionService has a submitJob() method
19
Base infrastructure to implement the architecture OGSA?
OGSA: Open Grid Services Architecture
● The method invocation should also be standardized. Corba? RMI? RPC? No : Web Services!!
● But need Stateful Web Services!
● WSRF: Web Services Resource Framework
20
The Web services WSDL/SOAP/HTTP pancake
In theory extensible and generic.In reality complex and monolitic
21
Going more inside Web services invocations
You don’t have to program the stubs/nor the SOAP requests/responsesJust like Corba and RMI
22
From stateless to stateful WS
Using the concept of resources
23
WS-Resources
Web Service + Resource = WS-ResourcesTo address these, we need a
endpoint reference to specify the resource
Think how simple are DNS, RmiRegistry... Nope
24
Specification, WSRF and more
● WS-ResourceProperties: defined in the WSDL interface
● WS-ResourceLifetime: manage lifecycle of the WS-Resources
● WS-ServiceGroup: group services or WS-Resources together allow to find in the group services meeting a
particular property allow also to address all services of the group by
one entry point● WS-BaseFaults: for fault reporting● WS-Notification: producer/consumer mode● WS-Addressing: to address the WS-Resources
25
Grid middlewareProvides WS-R
Grid middlewareIS WS-R
26
Writing a WSRF Web/Grid Service
Five Steps, only !
1. Define the service’s interface. This is done with WSDL
2. Implement the service. This is done with Java.
3. Define the deployment parameters. This is done with WSDD and JNDI
4. Compile everything and generate a GAR file. This is done with Ant
5. Deploy service. This is also done with a GT4 tool
27
A example service interface
public interface Math
{public void add(int a);public void subtract(int a);public int getValueRP();
}
In Java or IDL, the description is simple…
28
WSDLservice description
<?xml version="1.0" encoding="UTF-8"?>
<definitions name="MathService”
targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance"
xmlns="http://schemas.xmlsoap.org/wsdl/"
xmlns:tns="http://www.globus.org/namespaces/examples/core/MathService_instance"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns:wsrp="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.xsd"
xmlns:wsrpw="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl"
xmlns:wsdlpp="http://www.globus.org/namespaces/2004/10/WSDLPreprocessor"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<wsdl:import
namespace="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ResourceProperties-1.2-draft-01.wsdl"
location="../../wsrf/properties/WS-ResourceProperties.wsdl" />
29
<!==== P O R T T Y P E ==========>
<portType name="MathPortType"
wsdlpp:extends="wsrpw:GetResourceProperty"
wsrp:ResourceProperties="tns:MathResourceProperties">
<operation name="add">
<input message="tns:AddInputMessage"/>
<output message="tns:AddOutputMessage"/>
</operation>
<operation name="subtract">
<input message="tns:SubtractInputMessage"/>
<output message="tns:SubtractOutputMessage"/>
</operation>
<operation name="getValueRP">
<input message="tns:GetValueRPInputMessage"/>
<output message="tns:GetValueRPOutputMessage"/>
</operation>
</portType>
</definitions>
30
<!====== M E S S A G E S ======>
<message name="AddInputMessage">
<part name="parameters" element="tns:add"/>
</message>
<message name="AddOutputMessage">
<part name="parameters" element="tns:addResponse"/>
</message>
<message name="SubtractInputMessage">
<part name="parameters" element="tns:subtract"/>
</message>
<message name="SubtractOutputMessage">
<part name="parameters" element="tns:subtractResponse"/>
</message>
<message name="GetValueRPInputMessage">
<part name="parameters" element="tns:getValueRP"/>
</message>
<message name="GetValueRPOutputMessage">
<part name="parameters" element="tns:getValueRPResponse"/>
</message>
31
<! === T Y P E S ========>
<types>
<xsd:schema targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance" xmlns:tns="http://www.globus.org/namespaces/examples/core/MathService_instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<! REQUESTS AND RESPONSES >
<xsd:element name="add" type="xsd:int"/>
<xsd:element name="addResponse">
<xsd:complexType/>
</xsd:element>
<xsd:element name="subtract" type="xsd:int"/>
<xsd:element name="subtractResponse">
<xsd:complexType/>
</xsd:element>
<xsd:element name="getValueRP">
<xsd:complexType/>
</xsd:element>
<xsd:element name="getValueRPResponse" type="xsd:int"/>
32
<! RESOURCE PROPERTIES >
<xsd:element name="Value" type="xsd:int"/>
<xsd:element name="LastOp" type="xsd:string"/>
<xsd:element name="MathResourceProperties">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="tns:Value" minOccurs="1" maxOccurs="1"/>
<xsd:element ref="tns:LastOp" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
</types>
33
From stateless to stateful WS
Using the concept of resources
34
If you are still alive, you still have to
● Actually write the code● Configure the deployment
With WSDD and JNDI● Compile everithing with the right libraries● Generate a GAR file: Grid Archive ● Deploy into a container
● And it was a simple stateless service !
Most people just run code and forget about services
35
Behind the scene how does it work ?
36
Grids : Globus GRAM 4, everything is specified
GridFTPRFT
Delegation
GridFTP
GRAMservices
local sched.
user job
compute element
compute element and service host(s)
remote storage element(s)
FTP data
FTP control
clie
nt
job submit
delegate
xfer
req
uest
local job control
delegateGRAMadaptersu
do
37
Clouds : OpenStack : somewhat specified
OpenStack : Communication and meta-data
38
Structure
● Monitoring● Analyze● Decision● Implementation
MAPE-K loop
Concept view: actually several cooperative decisions
39
Monitoring
● Grid : Integrated monitoring– Ganglia
– NWS, Network Weather Service (adds prediction)
– Nagios
● Cloud– Provider : integrated
– User : no access to provider data● If you want something, deploy it
40
Monitoring example : Ganglia
● Goal: High performance
– Small messages to reduce network impact
– Hierarchical structure with aggregation nodes
– Scalability (few thousand nodes)
● Several components
– XDR for portable non-intrusive communication
– RRDtool for data storage and manipulation
– XML for data format
● Open Source
41
Analyze
Metrics
Computed using raw data from monitoring
ex: Energy consumption
● Grid: usually performance
– How many jobs are running
– How many are waiting
– How far are the deadlines
– Everything is at 100%
– Energy does (not) matter
42
Analyze
Metrics● Cloud
– Abstract « performance » do not exist : only users (QoS)
– Provider has an infrastructure point of view● Unused resources● Cost (electricity & management)
– Some classical metrics (Question : for who ?)
● Performance● Energy● Reliability● Dynamism
43
Decision
● Grids : already said– Most important : where and when to run
tasks
● Clouds– User: Optimize QoS
● Start new instances● Modify resource allocation of current instances
– Provider: save money (and electricity)● Consolidation● Switching on/off servers
44
Grid exemple : backfilling
Question : If 5 is longer, can we move 4 ?What could be the negative impact ?
45
Cloud exemple : steps for consolidation
46
Limits● Consolidation
– Real servers don't switch off
– Service interruption (even if few ms)
– Isolation
● Scheduling in general
– Fairness
– QoS evaluation
– Multi-metrics for antagonist objectives● « Performance », Energy, Resilience,
Dynamism
Question: How to manage reliability ?
47
Execute
● User: Depends on the application– Reconfiguration
– Data migration (web server, database)
– Scalability of the application
● Provider– Latency problems:
● Switching on/off a nodes: ~ 1 min
– Scale problem● Switching on/off 1000 nodes: power peaks
48
What about Peer to Peer ?
49
Control ?
● Several type of Peer to Peer systems– Corporate
● Distributed File system● Work Stealing
– Cooperative● Protein folding● BitCoins
50
Distributed Hash Table
● Main point of contact : DHT● Manages meta-data
– File systems
● Manages all data– Work sharing
● Several libraries– Kademlia
– Chord
51
Comparison with Grids and Clouds
● More specific– Toward simple data management
● Distributed file sharing
– Toward computation on simple data● Protein folding● BitCoins● Work stealing
● Some good properties– Low possibilities but simple to implement
– Decentralized Question : Decisions ?
52
Hype Cycle for Emerging Technologies, Gartner 2014
53
Bibliography
● The Grid 2: Blueprint for a New Computing Architecture. Ian Foster, Carl Kesselman
● The Globus Toolkit 4 Programmer’s Tutorial, Borja Sotomayor
● A view of cloud computing Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., ... & Zaharia, M.
● OpenStack: toward an open-source solution for cloud computing Sefraoui, Omar, Mohammed Aissaoui, and Mohsine Eleuldj
● Peer-to-peer computing Milojicic, Dejan S., Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne, Bruno Richard, Sami Rollins, and Zhichen Xu