simulating heterogeneous resources in cloudlightning

20
SELF-ORGANISING, SELF-MANAGING HETEROGENEOUS CLOUD Simulating Heterogeneous Resources in CloudLightning Dr. Christos Papadopoulos-Filelis

Upload: cloudlightning

Post on 22-Jan-2017

226 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Simulating Heterogeneous Resources in CloudLightning

SELF-ORGANISING, SELF-MANAGING HETEROGENEOUS CLOUD

Simulating Heterogeneous Resources in CloudLightning

Dr. Christos Papadopoulos-Filelis

Page 2: Simulating Heterogeneous Resources in CloudLightning

• Modern Cloud Computing infrastructures are gradually equipedwith different types of hardware:• Multicore processors• Accelerators (i.e. Intel Xeon Phi, GPUs) paired with CPUs• DFEs paired with CPUs

• CloudLightning project aims in the design of a system that canmanage these heterogeneous resources efficiently.

• CloudLightning is a self-managing, self-organizing system.• Local decisions based on user input and needs are performed

from the system ensuring optimal execution and decreased powerconsumption.

Overview of CloudLightning System

Page 3: Simulating Heterogeneous Resources in CloudLightning

• Cloud environments allow for expansion of the underlying resources withoutsubstantial changes in software.

• However, over-provisioning of resources was chosen as a method for assuringservice availability, leading to underutilization and extensive power consumption(Barroso and Holzle, 2007).

• Leveraging performance based on homogeneity (adding identical general purposecores) poses problems such limitations on power density, heat removal, etc (Cragoand Walters, 2015).

• Currently delivery models (IaaS, PaaS, SaaS) are not designed for handling HPCapplications.

• Integration of different types of hardware and management of increasingly morecomplex Cloud environments is at an early stage.

Facts and Challenges

Page 4: Simulating Heterogeneous Resources in CloudLightning

• Clouds are complex environments. Alan Turing’s observation“global order comes from local interactions” reveals a way ofmanaging and organizing without unnecessary overhead presentin hierarchical control. Thus, self-management and self-organization is performed locally.

• Use and management of specialized high performance hardware(i.e. GPUs, MICs, DFEs) with low Watt/FLOPS ratio.

• Limit overprovisioning by adaptively peaking optimal sets(coalitions) of resources for each application.

• Specialized Template based system for characterizingapplications.

• New delivery model, namely HPC Resource-as-a-Cloud.

CloudLightning Perspective

Page 5: Simulating Heterogeneous Resources in CloudLightning

• The life-cycle of the CloudLightning service begins by the EnterpriseApplication Developer (EAD).

• EAD submits a Blueprint to the Gateway Service.• Blueprint is a graph representing the workflow of Services collectively

composed to automate a particular task or business process.• Gateway Service is the front-end of the CL system providing a unified

access interface as well as a graphical interface for EADs.• Getaway receives resource options capable to service the Blueprint.

Moreover enables EAD to select and deploy the Blueprint.• The Gateway service contacts the “Cell” which contains the self-managing

self-organizing system.• CL system is typically composed of multiple cells.

Architecture

Page 6: Simulating Heterogeneous Resources in CloudLightning

Architecture

• Cell is associated to onegeographical region.

• Each Cell is composedof groups of resourcescalled vRacks.

• vRack maintainsinformation aboutservers with same typeof physical resources.

• vRack Managermanages a logical groupof these resourcescalled a coalition.

• vRack Group iscomposed of vRackswith same type ofresources.

Page 7: Simulating Heterogeneous Resources in CloudLightning

• Three HPC use cases have been chosen to validate CL system:• Genomics• Oil and Gas exploration• Ray tracing

• General Sparse and Dense matrix computations should be also investigated.• These use cases require large scale computational infrastructures, which are

costly to be build and operate on-site.• Cloud services are an effective choice for decreasing the cost of these

applications.• These applications are suitable for use with modern accelerator type hardware

i.e.:• Genomics: The local sequence alignment can be performed in DFEs

(Smith-Waterman algorithm).• Oil and Gas exploration: The Real Time Migration (RTM) as well as the

Open Porous Media framework based simulations can be efficientlyaccelerated in GPUs.

• Ray-Tracing: A variety of ray tracing engines exist for accelerators suchas Intel Embree that supports MICs and NVIDIA Optix that supportsGPUs.

Use Cases

Page 8: Simulating Heterogeneous Resources in CloudLightning

• Resource characterization involves defining the parameters that describethe execution of applications with respect to underlying hardware.

• The process of characterization becomes significantly more complexespecially in the case of heterogeneous hardware.

• The performance of modern processors is affected by an extensivenumber of parameters. Thus, modelling such hardware posses significantdifficulties.

• In order to limit the number of parameters required and obtain moreaccurate metrics, for the large scale simulation of the system, differentialmetrics with respect to a baseline executions can be used.

• Moreover, the chosen tests should resemble the “type” of computationalwork.

Resource Characterization

Page 9: Simulating Heterogeneous Resources in CloudLightning

• Examples of applications that can be used for characterization ofresources with respect to use cases are the following:

• Genomics:• For CPUs: MUMer, (Kurtz et al., 2004)• For GPUs: MUMerGPU, (Schatz et al., 2007)• For DFEs: Smith Waterman, (Maxeler, 2015)

• Ray-Tracing:• For CPUs: Embree, (Embree, 2016).• For GPUs: NVIDIA OptiX, (NVIDIA, 2016).• For MICs: Embree, (Embree, 2016).

• Using CPU executions and acquiring metrics such as Performance, withrespect to execution time, or GFLOPS and Energy consumption, abaseline set of results can be collected and used to produce indexes ofperformance. For example for Ray Tracing:

1

Resource Characterization

Page 10: Simulating Heterogeneous Resources in CloudLightning

• The chosen type of indexes simplifies computation of accurate metricswith respect to chosen applications.

• They can be easily induced with more characteristics such as energyconsumption, scalability etc.

• They are dimensionless numbers and can be combined with each other toproduce collective indexes to characterize the performance of a systembased on various applications.

• They simplify the simulation process from the aspect of simulating theexpected time of completion for a task based on the input data andbaseline metrics.

• Baseline metrics can be obtained using high end CPUs. Thus, enablingeasy realization of the improvement gained by using the CL system for agiven application.

Resource Characterization

Page 11: Simulating Heterogeneous Resources in CloudLightning

Performance Index{

CPU Performance:Oil and Gas Exploration and Sparse and Dense Matrix Computations

C1: SGEMMC2: DGEMMC3: Dense LU Factorization (SGESV, DGESV)C4: Sparse LU FactorizationC5: SFFTC6: DFFT

GPU Performance:Oil and Gas Exploration and Sparse and Dense Matrix Computations

C1: SGEMMC2: DGEMMC3: Dense LU Factorization (SGESV, DGESV)C4: Sparse LU FactorizationC5: SFFTC6: DFFT

MIC Performance:Oil and Gas Exploration and Sparse and Dense Matrix Computations

C1: SGEMMC2: DGEMMC3: Dense LU Factorization (SGESV, DGESV)C4: Sparse LU FactorizationC5: SFFTC6: DFFT

}

Resource Characterization

Page 12: Simulating Heterogeneous Resources in CloudLightning

• Oil and Gas exploration is not computationally monolithic since it involvesdense and sparse matrix computations which have different memoryaccess patterns and parallel performance.

• Oil and Gas exploration involves general sparse matrix computationsincluding: dense and sparse matrix multiplication and solution of largesparse and dense linear systems.

• Thus, metrics for this use case can be obtained by a set of tests involvinggeneral sparse and dense matrix computations and not perform oil andgas simulations with variable data input.

• Moreover, these metrics can be described by general sparse and densematrix computations present most large scale simulations including infields such as: Computational Fluid Dynamics, ComputationalAstrophysics or Computational Finance.

Resource Characterization

Page 13: Simulating Heterogeneous Resources in CloudLightning

• Two major categories of Cloud Simulation tools:• Discrete Event Based (DES): Avoid building and processing small

simulation objects (like packets). Instead, the effect of objectinteraction is captured. (Better performance, Less accuracy).

• Packet Level: Whenever a data message has to be transmittedbetween simulator entities a packet structure with its protocol headersis allocated in the memory and all the associated protocol processingis performed.

• The is no “Holy Grail” for Cloud Simulation.• Various simulation tools can be used to simulate a cloud environment

(CloudAnalyst (DES), CloudSched (DES), CloudSim (DES), DCSim(DES), GDCSim, GreenCloud (Packet Level), iCanCloud (DES),NetworkCloudSim)

Simulation Tools

Page 14: Simulating Heterogeneous Resources in CloudLightning

Simulation Tools

GUI Virtualization SchedulingNetworkModels

EnergyModels

ParallelExperiments

CloudAnalyst ✓ ✓ ✓Limited

(latency)✗ ✗

CloudSim(original version)

✗ ✓ ✓Limited

(latency)Limited ✗

CloudSched ✓ ✓ ✓ Limited ✓ ✗

DCSim ✗ ✓ ✓ ✗ ✓ ✗

GDCSim ✗ ✓ ✓ ✗ ✓ ✗

GreenCloud ✗ ✗ Limited ✓ ✓ ✗

iCanCloud ✓ ✓ ✓ ✓ ✗ ✓

NetworkCloudSim(integrated into

CloudSim)✗ ✓ ✓

✓ (latency & bandwidth)

Limited ✓

Page 15: Simulating Heterogeneous Resources in CloudLightning

• The CL system has various components and services not present intodays Cloud systems, thus classical simulation environments cannot beused.

• These components and services include: Coalition formation mechanism,vRack, vRack Manager, complex network communications, etc.

• Moreover, the different strategies for coalition formation, i.e static ordynamic, impact the system and affect the complexity of the simulation.

• Simulating the CL system requires the simulator to be inherently parallelsince the increased complexity of the system results in increasedcomputational work.

• CL system targets three distinct use cases that can be modeled by theaforementioned dimensionless indexes.

Elements of Simulation

Page 16: Simulating Heterogeneous Resources in CloudLightning

• In order to accurately simulate a CL based cloud environment the time scaleshould be defined. A more general approach will be to discretize time in termsof a time unit (tu). For simplicity let us consider 1 tu=1 second.

• A task can be simulated as follows:Task{

Number of Computational Units: WRequired Hardware: XApplication: YTime Units: Z

}• Each task occupies W hardware instances of computational capability X for an

application Y and for Z time units.• Required Hardware defines what type(s) of hardware is required and Y is the

type of application.

Elements of Simulation

Page 17: Simulating Heterogeneous Resources in CloudLightning

• Similarly a coalition of capable hardware can be described by the following parameters:Coalition{

Number of Computational Units: CUCompute Capability: CCInterconnection: IStorage Interconnection: SIPower Consumption: PCost: COServer Initialization: S

}• Compute Capability is a dimensionless index computed with the

procedure presented in the characterization of hardware. Power Consumption and Cost are dimensionless numbers measured with respect to a baseline.

• Interconnection and Storage Interconnection is measured in (Gbps).• Server initialization is measured in (tu).

Elements of Simulation

Page 18: Simulating Heterogeneous Resources in CloudLightning

• The hardware is engaged for the number of time units prescribed by atask (Z). An amount of time (S) is required for initializing the hardware incase it was on a sleep state. Moreover, before the Task begins execution,initialization time to a functional state is required: transfer of softwareimages, installation of appropriate libraries, mounting of storage, etc.Thus, the real time a Task requires is:

• Time of Execution (tu) = S+Z+α+β• where α is the time required by the hardware to be functional and β the

time required by the hardware to return to idle waiting for the next Task.• α mostly depends on the speed of the network as well as on the number

of users (N) occupying the same storage server simultaneously.• α (tu) = (Size of the image (GB))/(SI(Gbps)*tu/(N*8*109)).• β is the time required to free the resource and return it to an idle state.• Finally for the network a linear model is considered:• delay=latency+size (GB)/ BW (GBps)

Elements of Simulation

Page 19: Simulating Heterogeneous Resources in CloudLightning

• Finally, a custom simulation framework will be used since CL is a unique Cloud Computing environment.

• The presented simulation entities are enhanced in order to describe different types of deployment such as containers and bare metal images.

• The simulator must be able to cope with static and dynamic coalition choices as well as update the list of static coalitions based on users’ choices.

• Auto-scaling model should be introduced in the simulation for the three use cases.

• This parallel hybrid DES-packet level simulation scheme is expected to describe adequately the CL system.

Elements of Simulation and Future Work

Page 20: Simulating Heterogeneous Resources in CloudLightning

THANK YOUDr. Christos [email protected]