cloudlightning - project and architecture overview

41
Prof. John Morrison (UCC)

Upload: cloudlightning

Post on 24-Jan-2018

76 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: CloudLightning - Project and Architecture Overview

Prof. John Morrison (UCC)

Page 2: CloudLightning - Project and Architecture Overview

The Consortium

Page 3: CloudLightning - Project and Architecture Overview

Partners

CloudLightning comprises of eight partners from academia and industry and is coordinated by University College Cork.

Industrial partners:• Intel Ireland (IE)• Maxeler (UK)

Academic partners:• University College Cork (IE)• Norwegian University of

Science and Technology (NO)• Institute e-Austria Timisoara

(RO)• Democritus University of

Thrace (GR)• The Centre for Research &

Technology, Hellas (GR)• Dublin City University (IE)

Page 4: CloudLightning - Project and Architecture Overview

PROJECT OVERVIEW

Page 5: CloudLightning - Project and Architecture Overview

SpecificChallenge

CloudLightning was funded under Call H2020-ICT-2014-1 Advanced Cloud Infrastructures and Services.

The aim is to developinfrastructures, methodsand tools for high performance, adaptivecloud applications andServices that go beyond the current capabilities.

• Cloud computing is being transformed by new requirements such as - heterogeneity of resources and devices- software-defined data centres - cloud networking, security, and - the rising demands for better quality of user experience.

• Cloud computing research will be oriented towards - new computational and data management models (at both

infrastructure and services levels) that respond to the advent of faster and more efficient machines,

- rising heterogeneity of access modes and devices, - demand for low energy solutions, - widespread use of big data, - federated clouds and - secure multi-actor environments including public administrations.

Page 6: CloudLightning - Project and Architecture Overview

EU Use Case Motivations

CloudLightning’s use cases support the European Union HPC strategy and specific industries identified by IDC in their recent report on the progress of the EU HPC Strategy (IDC, 2015).

1The health sector represents 10% of EU GDP and 8% of the EU workforce (EC, 2014). HPC is increasingly central to genome processing and thus advanced medicine and bioscience research.

2The oil and gas industry is responsible for 170,000 European jobs and €440 billion of Europe's GDP (IDC, 2015). HPC improves discovery performance and exploitation.

3Ray tracing is a fundamental technology in many industries and specifically in CAD/CAE, digital content and mechanical design, sectors dominated by SMEs.

4European ROI in HPC is very attractive - each euro invested in HPC on average returned €867 in increased revenue/income (IDC, 2015).

Page 7: CloudLightning - Project and Architecture Overview

The HPC Market

Although the EU has the largest GDP in the world (€13.2 trillion), the U.S. has substantially outspent the EU region in high performance computing which has a knock-on effect in scientific discovery, innovation and competitiveness.

IDC estimate the HPC market at €21bn.

IDC forecasts that European HPC ecosystem spending will increase by 37.8% (6.6% CAGR) to reach about €5.2 billion in 2018, or 24.9% of worldwide HPC ecosystem spending (€21.3 billion).

Page 8: CloudLightning - Project and Architecture Overview

HPC Challenges

“The challenge is less about educating users about cloud computing and more about the ability of clouds to handle more types of HPC jobs over time.”

IDC, 2015

1 Hard to use without deep IT knowledge

2 Expensive

3 Inaccessible to individuals and SMEs

Traditional High Performance Computing is…

4 Inflexible

Most HPC workloads are not ready to run on today’s cloud architectures.

Page 9: CloudLightning - Project and Architecture Overview

The Market for HPC in the

Cloud

Cloud segment is the one of the smallest but fastest growing segments in the HPC market.

Spending on HPC in the cloud and Hybrid-custom HPC clouds is forecast to grow from US$1.7bn in 2015 to US$5.2bn in 2017 (IDC, 2015).

The proportion of HPC sites employing cloud computing has grown from 13.8% in 2011, to 23.5% in 2013, to 34.1% in 2015 (IDC, 2015).

CloudLightning primary research suggests 48% of sites are using cloud computing although for relatively less complex workloads.

$1.5billion

$3.7billion

$15.4billion

Hybrid-Custom HPC Clouds(2017)

HPC Public Clouds(2017)

Traditional HPC Servers and Private Clouds

(2017)

Page 10: CloudLightning - Project and Architecture Overview

Drivers and Barriers to HPC in

the Cloud Adoption

Our primary research (n=92) confirms our desk research which suggests that there are significant economic and capacity-related drivers but both general cloud and HPC-specific barriers to HPC in the cloud adoption.

1 Access to extra capacity for overflow or surge workloads

2 Reduced capital costs

3 Access to a datacentre or specialised software

Drivers

1 Data protection and control

2

3Complexity and difficulties migrating and integrating existing systems with the Cloud

Barriers

Communication speed concerns

Page 11: CloudLightning - Project and Architecture Overview

CloudLightning Objectives

CloudLightning seeks to address the challenges in the HPC market through 9 technical, commercial and societal objectives.

Build Prototype Management System and Delivery Model(WP4, WP5, WP6)

Competitive Advantage through Infrastructure

Efficiencies(WP4, WP8)

Energy Efficiency(WP3, WP7)

Validate Approach with Use Cases(WP5, WP6)

Competitive Advantage through Improved

Accessibility(WP5, WP6, WP8)

Improved Accessibility to Cloud Resources(WP2, WP5, WP6)

Demonstrate Scalability

(WP7)

Opportunities in Use Case Domains(WP2, WP8)

Scientific Advancement(WP8)

Technical Objectives Commercial Objectives Societal Objectives

Page 12: CloudLightning - Project and Architecture Overview

CloudLightning Approach

CloudLightning proposes a novel architecture for provisioning heterogeneous cloud resources to deliver services, specified by the user, using a bespoke service description language.

01Complexity

CloudLightning uses self-organisation and self-management to manage complexity effectively.

02Heterogeneous Resources

CloudLightning was specifically for heterogeneous hardware03

IaaS Access

04Energy Efficiency

05Resource Utilisation

CloudLightning uses dynamic workload and resource management to increase the efficiency of resource utilisation.

06Service Deployment

The CloudLightning deployment mechanism simplifies the operational overhead for non-technical users

Achieved through heterogeneous resources, reducing overprovisioning, maximising VM/server density and turning off idle servers

Clear service interface through separation of concerns between consumer and provider.

Page 13: CloudLightning - Project and Architecture Overview

GatewayService

Self OrganizingSelf Management System

Plug & PlayService

BlueprintCreator

End User

Services Catalogue

Blueprint Catalogue EnterpriseCloud

Operator

GatewayService

UI

Heterogeneous Resources

New Hardware

DeployService

Service User

Perspective

Monitor

Request to join

CL-Resource

DiscoverResource

Extract / Modify

Blueprints

RequestResource

CL-Resources

Deploy Blueprint

RunningService

Extract Blueprint

Get Services

CreateBlueprints

GetStatus

ResourceHandler

Page 14: CloudLightning - Project and Architecture Overview

Progress Beyond the State of the Art

CloudLightning is, and will, contribute to progress beyond the state of the art across all technical work packages and primary use cases.

We are, and will, contribute to:

1. The expected impacts listed in the call topic

2. The innovative capacity of the consortium members

3. The innovative capacity of European industry

4. Other European environmental and societal priorities

Cloud Architecture

ServiceDescription Languages

Local Decision Strategy

Framework

Resource Coalitions

Ray Tracing

Oil & Gas

Genome Processing

Large Scale Simulation

1

5

37

2

64

8

Page 15: CloudLightning - Project and Architecture Overview

JOHN MORRISON | [email protected]

THANK YOU

Page 16: CloudLightning - Project and Architecture Overview

ARCHITECTURE OVERVIEW

Page 17: CloudLightning - Project and Architecture Overview

Design Requirements

Create a Heterogeneous Service-Oriented Cloud Architecture to Support HPC Workloads

1

2

3

4

Ease of Use

Improve Resource Utilization compared to current Cloud deployments

Support Heterogeneity

Improve Service Delivery

Page 18: CloudLightning - Project and Architecture Overview

Blueprints, Service

Catalogue and Implementation

Library

SelfOrganizingSelfManagement

Framework

Blueprint

PhysicalResourcesServicesCatalogue

BlueprintCreator

EndUser

• A Blueprint is a composition of services.

• A service describes the features of many different hardware types and executable code for the same task.

• An implementation is an executable code on a hardware type of a task.

GatewayService

BlueprintCatalogue

Plug&PlayService

Coalition

Coalition

Coalition

Deployed Blueprint

BlueprintCatalogue

EnterpriseCloudOperator

GatewayService

Page 19: CloudLightning - Project and Architecture Overview

Service1

ServiceCatalogue

Service2

Service3

ImplementationLibrary

Implementation 1

Implementation 2

Implementation 3

id: unique identifier

definition: concrete SW/HW

(...)

Implementation

id: unique identifier

definition: service specification

constraints: logical expressions

metrics: atomic values

parameters: atomic values

Service

id: unique identifier

constraints: logical expressions

metrics: atomic values

parameters: atomic values

Blueprint

Noimplementation

Blueprint 1

BlueprintCatalogue

Blueprint 2

Blueprint 3

Composition ofservices

Blueprints, Service

Catalogue and Implementation

Library• A Blueprint is a

composition of services.

• A service describes the features of many different hardware types and executable code for the same task.

• An implementation is an executable code on a hardware type of a task.

Page 20: CloudLightning - Project and Architecture Overview

CloudLightningAPI Flow

The main CL system

components, APIs,

communication protocols

and a sequence of

documents that maintains

the state of each, and every,

interaction has been

defined.

Page 21: CloudLightning - Project and Architecture Overview

CloudLightning Message Relationships

Page 22: CloudLightning - Project and Architecture Overview

CloudLightningProtocol

Specification

Default request content-

types: application/json

Default response content-

types: application/json

Schemes: http, https

Page 23: CloudLightning - Project and Architecture Overview

GatewayService

SelfOrganizingSelfManagement

Framework

Blueprint

PhysicalResourcesServicesCatalogue

BlueprintCatalogue

Coalition

Coalition

Coalition

Deployed Blueprint

Coalition

Coalition

Coalition

Deployed Blueprint

Plug&PlayService

• Use service characteristics to determine best implementation hardware type.

• Locate resources of the appropriate type.

• Return resource handlers to the Gateway via the Blueprint.

• Invoke the deployment mechanism.

Creating a Resourced Blueprint

Page 24: CloudLightning - Project and Architecture Overview

We assume a Cloud with a Resource Fabric far greater than that currently available.

Adding structure to theCloud Fabric by creating virtual partitions and grouping them together.

Management of physical

resources• The resource fabric is partitioned

into vRacks.

• Each vRack is managed by a vRack Manager.

• A vRack Manager can form Coalitions of its resources to support services.

• vRack Managers self organize to optimize service delivery

HeterogeneousPhysicalResources

Page 25: CloudLightning - Project and Architecture Overview

• A vRack is a homogeneous partition of the resource fabric.

• Each vRack is managed by a dedicated vRack Manager.

• vRack Managers of different types exist based on the resource types being managed.

vRacks and vRack Managers

Svr

Svr

Svr

Svr

Svr

Svr

Svr

Svr Svr

ResourcesFabric

vRack

vRack

vRack

vRack

vRack

vRack Manager

Specialized HW

Specialized HW

vRack

vRackSvr Svr Svr Svr

vRack Manager

DedicatedHigh-speed Interconnection

Svr Svr

vRack

vRack Manager

Page 26: CloudLightning - Project and Architecture Overview

• Groups of vRack Managers can be formed to simplify access to resources and to enable self-organization

• There are three types of vRack Manager Groups.

vRack Manager Groups

vRack Manager

Specialized HW

Specialized HW

vRack

vRack Manager

Specialized HW

Specialized HW

vRack

vRackSvr Svr Svr Svr

vRack Manager

DedicatedHigh-speed Interconnection

vRackSvr Svr Svr Svr

vRack Manager

DedicatedHigh-speed Interconnection

TypeA

TypeB

TypeC

Svr Svr

vRack

vRack Manager

Svr Svr

vRack

vRack Manager

Page 27: CloudLightning - Project and Architecture Overview

To generically manipulate resources of different types, the SOSM system introduces the concept of a CL-Resource.

CL-Resources refer to different hardware types and to different configurations of those type.

Thus heterogeneity can be introduced dynamically.

CL-ResourcesLocalResourceManager

SvrMIC

Svr

Svr

SvrMIC MIC

MIC

MIC-World

MIC ClusterofServers Container/VM

ResourcePartitioningPosibilities

Page 28: CloudLightning - Project and Architecture Overview

Advanced architecture

support

• Dynamic VPN creation for Blueprint Service Execution

• Autoscaling

• High availability

• Data locality

BlueprintS1

S3S2

vRack

Server

Server

Server

Server

vRack

Server

Server

Server

Server

Virtual Network Connection

Page 29: CloudLightning - Project and Architecture Overview

GatewayService

SelfOrganizingSelfManagement

Framework

Blueprint

ServicesCatalogue

BlueprintCatalogue

Coalition

Coalition

Coalition

Deployed Blueprint

Plug&PlayService

• Use service characteristics to determine best implementation hardware type.

• Locate resources of the appropriate type.

• Return resource handlers to the Gateway via the Blueprint.

• Invoke the deployment mechanism.

Creating a Resourced Blueprint

PhysicalResources

Page 30: CloudLightning - Project and Architecture Overview

A Framework for Hosting and

Executing SOSM

Strategies

A framework for hosting and

executing SOSM strategiesassociated with any

hierarchical architecture to

achieve their local goals,

eventually the whole system

evolves to the ideal global

goal state.

Perception

Metrics

Assessment Functions

Impetus

Weights

Suitability Index

Directed Evolution

Page 31: CloudLightning - Project and Architecture Overview

Architecture showing the components and their relationships.

The conceptual architecture

Page 32: CloudLightning - Project and Architecture Overview

Augmented CloudLightning

Architecture

The CL architecture is

expressed as a

hierarchical architecture,

introducing pRouters and

pSwitches

pSwitch

pSwitch

pSwitch

Page 33: CloudLightning - Project and Architecture Overview

Customizing the self-organisation self-management framework with CL strategies

The Assessment Functions and

Directed Evolution are related to

the CL specific objectives of:

• Maximizing task throughput

• Maximizing energy efficiency

• Maximizing computational

efficiency

• Maximizing resource

management efficiency

Metrics

Weights

Perception Impetus

Suitability Index

Local goal: maximize its Suitability Index

Page 34: CloudLightning - Project and Architecture Overview

Visualisation of Self-organisation self-management framework

Page 35: CloudLightning - Project and Architecture Overview

Self-organisationframework

augmentations in support of

virtualization

Goals:• Support for

virtualization

• Increase resource

utilization

• Decrease job rejection

rate

Add new assessment function reflecting Memory consumption

Two-stage self-organisation strategy introduced: CPU and vCPU

Resource over-commitment is addressed

Page 36: CloudLightning - Project and Architecture Overview

• Coalitions are used to support the process parallelism within a service.

• Coalitions exist entirely inside a vRack.

• The CL-Resources of a Coalition may span multiple servers within the same vRack.

WP 3

Coalitions

Server Server Server

Server Server Server

vRack

Page 37: CloudLightning - Project and Architecture Overview

Coalition Formation Strategies

Task Compaction

Isotropy Preservation

Dependency Minimization

Machine-based coalition

formation strategies:

• Task Compaction

• Isotropy Preservation

• Dependency Minimization

Page 38: CloudLightning - Project and Architecture Overview

Coalition Formation Strategies

Coalition Size Frequency Workload Execution Constraints

Workload-based coalition

formation strategies:

• Coalition Size Frequency

• Workload Execution

Constraints

Page 39: CloudLightning - Project and Architecture Overview

The Telemetry system provides updates to the SOSM system on the status of resources fabric.

It is implemented by using InfluxDB and SNAP.

Determining the local state

GatewayService

SelfOrganizingSelfManagement

Framework

Blueprint

ServicesCatalogue

BlueprintCatalogue

Plug&PlayService

Coalition

Coalition

Coalition

Deployed Blueprint

BlueprintCreator

EndUser

Plug&PlayService

SelfOrganizingSelfManagement

Framework

PhysicalResourcesPhysicalResourcesEnterpriseCloudOperator

Page 40: CloudLightning - Project and Architecture Overview

• The SOSM system supports the addition of new hardware by using a plug and play mechanism.

• New hardware can register with SOSM and it is automatically added and managed.

Support for new hardware

GatewayService

SelfOrganizingSelfManagement

Framework

Blueprint

PhysicalResourcesServicesCatalogue

BlueprintCatalogue

Plug&PlayService

Coalition

Coalition

Coalition

Deployed Blueprint

BlueprintCreator

EndUser

SelfOrganizingSelfManagement

Framework

PhysicalResourcesEnterpriseCloudOperator

Page 41: CloudLightning - Project and Architecture Overview

SOSMFramework

CellManager

PhysicalResources

Resource Abstraction Layer

Plug&PlayService

vRackManager

SelfOrganizingSelfManagementSystem

vRackManager vRackManager

NewHW

• The SOSM system supports the addition of new hardware by using a plug and play mechanism.

• New hardware can register with SOSM and it is automatically added and managed.

Support for new hardware