cloudlightning - project and architecture overview
TRANSCRIPT
Prof. John Morrison (UCC)
The Consortium
Partners
CloudLightning comprises of eight partners from academia and industry and is coordinated by University College Cork.
Industrial partners:• Intel Ireland (IE)• Maxeler (UK)
Academic partners:• University College Cork (IE)• Norwegian University of
Science and Technology (NO)• Institute e-Austria Timisoara
(RO)• Democritus University of
Thrace (GR)• The Centre for Research &
Technology, Hellas (GR)• Dublin City University (IE)
PROJECT OVERVIEW
SpecificChallenge
CloudLightning was funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services.
The aim is to developinfrastructures, methodsand tools for high performance, adaptivecloud applications andServices that go beyond the current capabilities.
• Cloud computing is being transformed by new requirements such as - heterogeneity of resources and devices- software-defined data centres - cloud networking, security, and - the rising demands for better quality of user experience.
• Cloud computing research will be oriented towards - new computational and data management models (at both
infrastructure and services levels) that respond to the advent of faster and more efficient machines,
- rising heterogeneity of access modes and devices, - demand for low energy solutions, - widespread use of big data, - federated clouds and - secure multi-actor environments including public administrations.
EU Use Case Motivations
CloudLightning’s use cases support the European Union HPC strategy and specific industries identified by IDC in their recent report on the progress of the EU HPC Strategy (IDC, 2015).
1The health sector represents 10% of EU GDP and 8% of the EU workforce (EC, 2014). HPC is increasingly central to genome processing and thus advanced medicine and bioscience research.
2The oil and gas industry is responsible for 170,000 European jobs and €440 billion of Europe's GDP (IDC, 2015). HPC improves discovery performance and exploitation.
3Ray tracing is a fundamental technology in many industries and specifically in CAD/CAE, digital content and mechanical design, sectors dominated by SMEs.
4European ROI in HPC is very attractive - each euro invested in HPC on average returned €867 in increased revenue/income (IDC, 2015).
The HPC Market
Although the EU has the largest GDP in the world (€13.2 trillion), the U.S. has substantially outspent the EU region in high performance computing which has a knock-on effect in scientific discovery, innovation and competitiveness.
IDC estimate the HPC market at €21bn.
IDC forecasts that European HPC ecosystem spending will increase by 37.8% (6.6% CAGR) to reach about €5.2 billion in 2018, or 24.9% of worldwide HPC ecosystem spending (€21.3 billion).
HPC Challenges
“The challenge is less about educating users about cloud computing and more about the ability of clouds to handle more types of HPC jobs over time.”
IDC, 2015
1 Hard to use without deep IT knowledge
2 Expensive
3 Inaccessible to individuals and SMEs
Traditional High Performance Computing is…
4 Inflexible
Most HPC workloads are not ready to run on today’s cloud architectures.
The Market for HPC in the
Cloud
Cloud segment is the one of the smallest but fastest growing segments in the HPC market.
Spending on HPC in the cloud and Hybrid-custom HPC clouds is forecast to grow from US$1.7bn in 2015 to US$5.2bn in 2017 (IDC, 2015).
The proportion of HPC sites employing cloud computing has grown from 13.8% in 2011, to 23.5% in 2013, to 34.1% in 2015 (IDC, 2015).
CloudLightning primary research suggests 48% of sites are using cloud computing although for relatively less complex workloads.
$1.5billion
$3.7billion
$15.4billion
Hybrid-Custom HPC Clouds(2017)
HPC Public Clouds(2017)
Traditional HPC Servers and Private Clouds
(2017)
Drivers and Barriers to HPC in
the Cloud Adoption
Our primary research (n=92) confirms our desk research which suggests that there are significant economic and capacity-related drivers but both general cloud and HPC-specific barriers to HPC in the cloud adoption.
1 Access to extra capacity for overflow or surge workloads
2 Reduced capital costs
3 Access to a datacentre or specialised software
Drivers
1 Data protection and control
2
3Complexity and difficulties migrating and integrating existing systems with the Cloud
Barriers
Communication speed concerns
CloudLightning Objectives
CloudLightning seeks to address the challenges in the HPC market through 9 technical, commercial and societal objectives.
Build Prototype Management System and Delivery Model(WP4, WP5, WP6)
Competitive Advantage through Infrastructure
Efficiencies(WP4, WP8)
Energy Efficiency(WP3, WP7)
Validate Approach with Use Cases(WP5, WP6)
Competitive Advantage through Improved
Accessibility(WP5, WP6, WP8)
Improved Accessibility to Cloud Resources(WP2, WP5, WP6)
Demonstrate Scalability
(WP7)
Opportunities in Use Case Domains(WP2, WP8)
Scientific Advancement(WP8)
Technical Objectives Commercial Objectives Societal Objectives
CloudLightning Approach
CloudLightning proposes a novel architecture for provisioning heterogeneous cloud resources to deliver services, specified by the user, using a bespoke service description language.
01Complexity
CloudLightning uses self-organisation and self-management to manage complexity effectively.
02Heterogeneous Resources
CloudLightning was specifically for heterogeneous hardware03
IaaS Access
04Energy Efficiency
05Resource Utilisation
CloudLightning uses dynamic workload and resource management to increase the efficiency of resource utilisation.
06Service Deployment
The CloudLightning deployment mechanism simplifies the operational overhead for non-technical users
Achieved through heterogeneous resources, reducing overprovisioning, maximising VM/server density and turning off idle servers
Clear service interface through separation of concerns between consumer and provider.
GatewayService
Self OrganizingSelf Management System
Plug & PlayService
BlueprintCreator
End User
Services Catalogue
Blueprint Catalogue EnterpriseCloud
Operator
GatewayService
UI
Heterogeneous Resources
New Hardware
DeployService
Service User
Perspective
Monitor
Request to join
CL-Resource
DiscoverResource
Extract / Modify
Blueprints
RequestResource
CL-Resources
Deploy Blueprint
RunningService
Extract Blueprint
Get Services
CreateBlueprints
GetStatus
ResourceHandler
Progress Beyond the State of the Art
CloudLightning is, and will, contribute to progress beyond the state of the art across all technical work packages and primary use cases.
We are, and will, contribute to:
1. The expected impacts listed in the call topic
2. The innovative capacity of the consortium members
3. The innovative capacity of European industry
4. Other European environmental and societal priorities
Cloud Architecture
ServiceDescription Languages
Local Decision Strategy
Framework
Resource Coalitions
Ray Tracing
Oil & Gas
Genome Processing
Large Scale Simulation
1
5
37
2
64
8
JOHN MORRISON | [email protected]
THANK YOU
ARCHITECTURE OVERVIEW
Design Requirements
Create a Heterogeneous Service-Oriented Cloud Architecture to Support HPC Workloads
1
2
3
4
Ease of Use
Improve Resource Utilization compared to current Cloud deployments
Support Heterogeneity
Improve Service Delivery
Blueprints, Service
Catalogue and Implementation
Library
SelfOrganizingSelfManagement
Framework
Blueprint
PhysicalResourcesServicesCatalogue
BlueprintCreator
EndUser
• A Blueprint is a composition of services.
• A service describes the features of many different hardware types and executable code for the same task.
• An implementation is an executable code on a hardware type of a task.
GatewayService
BlueprintCatalogue
Plug&PlayService
Coalition
Coalition
Coalition
Deployed Blueprint
BlueprintCatalogue
EnterpriseCloudOperator
GatewayService
Service1
ServiceCatalogue
Service2
Service3
ImplementationLibrary
Implementation 1
Implementation 2
Implementation 3
id: unique identifier
definition: concrete SW/HW
(...)
Implementation
id: unique identifier
definition: service specification
constraints: logical expressions
metrics: atomic values
parameters: atomic values
Service
id: unique identifier
constraints: logical expressions
metrics: atomic values
parameters: atomic values
Blueprint
Noimplementation
Blueprint 1
BlueprintCatalogue
Blueprint 2
Blueprint 3
Composition ofservices
Blueprints, Service
Catalogue and Implementation
Library• A Blueprint is a
composition of services.
• A service describes the features of many different hardware types and executable code for the same task.
• An implementation is an executable code on a hardware type of a task.
CloudLightningAPI Flow
The main CL system
components, APIs,
communication protocols
and a sequence of
documents that maintains
the state of each, and every,
interaction has been
defined.
CloudLightning Message Relationships
CloudLightningProtocol
Specification
Default request content-
types: application/json
Default response content-
types: application/json
Schemes: http, https
GatewayService
SelfOrganizingSelfManagement
Framework
Blueprint
PhysicalResourcesServicesCatalogue
BlueprintCatalogue
Coalition
Coalition
Coalition
Deployed Blueprint
Coalition
Coalition
Coalition
Deployed Blueprint
Plug&PlayService
• Use service characteristics to determine best implementation hardware type.
• Locate resources of the appropriate type.
• Return resource handlers to the Gateway via the Blueprint.
• Invoke the deployment mechanism.
Creating a Resourced Blueprint
We assume a Cloud with a Resource Fabric far greater than that currently available.
Adding structure to theCloud Fabric by creating virtual partitions and grouping them together.
Management of physical
resources• The resource fabric is partitioned
into vRacks.
• Each vRack is managed by a vRack Manager.
• A vRack Manager can form Coalitions of its resources to support services.
• vRack Managers self organize to optimize service delivery
HeterogeneousPhysicalResources
• A vRack is a homogeneous partition of the resource fabric.
• Each vRack is managed by a dedicated vRack Manager.
• vRack Managers of different types exist based on the resource types being managed.
vRacks and vRack Managers
Svr
Svr
Svr
Svr
Svr
Svr
Svr
Svr Svr
ResourcesFabric
vRack
vRack
vRack
vRack
vRack
vRack Manager
Specialized HW
Specialized HW
vRack
vRackSvr Svr Svr Svr
vRack Manager
DedicatedHigh-speed Interconnection
Svr Svr
vRack
vRack Manager
• Groups of vRack Managers can be formed to simplify access to resources and to enable self-organization
• There are three types of vRack Manager Groups.
vRack Manager Groups
vRack Manager
Specialized HW
Specialized HW
vRack
vRack Manager
Specialized HW
Specialized HW
vRack
vRackSvr Svr Svr Svr
vRack Manager
DedicatedHigh-speed Interconnection
vRackSvr Svr Svr Svr
vRack Manager
DedicatedHigh-speed Interconnection
TypeA
TypeB
TypeC
Svr Svr
vRack
vRack Manager
Svr Svr
vRack
vRack Manager
To generically manipulate resources of different types, the SOSM system introduces the concept of a CL-Resource.
CL-Resources refer to different hardware types and to different configurations of those type.
Thus heterogeneity can be introduced dynamically.
CL-ResourcesLocalResourceManager
SvrMIC
Svr
Svr
SvrMIC MIC
MIC
MIC-World
MIC ClusterofServers Container/VM
ResourcePartitioningPosibilities
Advanced architecture
support
• Dynamic VPN creation for Blueprint Service Execution
• Autoscaling
• High availability
• Data locality
BlueprintS1
S3S2
vRack
Server
Server
Server
Server
vRack
Server
Server
Server
Server
Virtual Network Connection
GatewayService
SelfOrganizingSelfManagement
Framework
Blueprint
ServicesCatalogue
BlueprintCatalogue
Coalition
Coalition
Coalition
Deployed Blueprint
Plug&PlayService
• Use service characteristics to determine best implementation hardware type.
• Locate resources of the appropriate type.
• Return resource handlers to the Gateway via the Blueprint.
• Invoke the deployment mechanism.
Creating a Resourced Blueprint
PhysicalResources
A Framework for Hosting and
Executing SOSM
Strategies
A framework for hosting and
executing SOSM strategiesassociated with any
hierarchical architecture to
achieve their local goals,
eventually the whole system
evolves to the ideal global
goal state.
Perception
Metrics
Assessment Functions
Impetus
Weights
Suitability Index
Directed Evolution
Architecture showing the components and their relationships.
The conceptual architecture
Augmented CloudLightning
Architecture
The CL architecture is
expressed as a
hierarchical architecture,
introducing pRouters and
pSwitches
pSwitch
pSwitch
pSwitch
Customizing the self-organisation self-management framework with CL strategies
The Assessment Functions and
Directed Evolution are related to
the CL specific objectives of:
• Maximizing task throughput
• Maximizing energy efficiency
• Maximizing computational
efficiency
• Maximizing resource
management efficiency
Metrics
Weights
Perception Impetus
Suitability Index
Local goal: maximize its Suitability Index
Visualisation of Self-organisation self-management framework
Self-organisationframework
augmentations in support of
virtualization
Goals:• Support for
virtualization
• Increase resource
utilization
• Decrease job rejection
rate
Add new assessment function reflecting Memory consumption
Two-stage self-organisation strategy introduced: CPU and vCPU
Resource over-commitment is addressed
• Coalitions are used to support the process parallelism within a service.
• Coalitions exist entirely inside a vRack.
• The CL-Resources of a Coalition may span multiple servers within the same vRack.
WP 3
Coalitions
Server Server Server
Server Server Server
vRack
Coalition Formation Strategies
Task Compaction
Isotropy Preservation
Dependency Minimization
Machine-based coalition
formation strategies:
• Task Compaction
• Isotropy Preservation
• Dependency Minimization
Coalition Formation Strategies
Coalition Size Frequency Workload Execution Constraints
Workload-based coalition
formation strategies:
• Coalition Size Frequency
• Workload Execution
Constraints
The Telemetry system provides updates to the SOSM system on the status of resources fabric.
It is implemented by using InfluxDB and SNAP.
Determining the local state
GatewayService
SelfOrganizingSelfManagement
Framework
Blueprint
ServicesCatalogue
BlueprintCatalogue
Plug&PlayService
Coalition
Coalition
Coalition
Deployed Blueprint
BlueprintCreator
EndUser
Plug&PlayService
SelfOrganizingSelfManagement
Framework
PhysicalResourcesPhysicalResourcesEnterpriseCloudOperator
• The SOSM system supports the addition of new hardware by using a plug and play mechanism.
• New hardware can register with SOSM and it is automatically added and managed.
Support for new hardware
GatewayService
SelfOrganizingSelfManagement
Framework
Blueprint
PhysicalResourcesServicesCatalogue
BlueprintCatalogue
Plug&PlayService
Coalition
Coalition
Coalition
Deployed Blueprint
BlueprintCreator
EndUser
SelfOrganizingSelfManagement
Framework
PhysicalResourcesEnterpriseCloudOperator
SOSMFramework
CellManager
PhysicalResources
Resource Abstraction Layer
Plug&PlayService
vRackManager
SelfOrganizingSelfManagementSystem
vRackManager vRackManager
NewHW
• The SOSM system supports the addition of new hardware by using a plug and play mechanism.
• New hardware can register with SOSM and it is automatically added and managed.
Support for new hardware