a view of cloud computing: concepts and challengeseol/ssiim/1516/seminars/ssim_cloud...13-11-2013 7...

25
13-11-2013 1 A View of Cloud Computing: Concepts and Challenges Jorge G. Barbosa Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal [email protected] A View of Cloud Computing: Concepts and Challenges A View of Cloud Computing: Concepts and Challenges FEUP, 2013 FEUP, 2013 Outline Part I: Basic Concepts Introduction and Principals Overview Part II: Challenges Fault Tolerance Energy optimization Quality of Service (QoS) Part III: Current Research 2 A View of Cloud Computing: Concepts and Challenges A View of Cloud Computing: Concepts and Challenges

Upload: nguyenduong

Post on 21-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

13-11-2013

1

A View of Cloud Computing:

Concepts and Challenges

Jorge G. Barbosa

Universidade do Porto,

Faculdade de Engenharia, LIACC

Porto, Portugal

[email protected]

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges FEUP, 2013FEUP, 2013

Outline

� Part I: Basic Concepts

� Introduction and Principals Overview

� Part II: Challenges

� Fault Tolerance

� Energy optimization

� Quality of Service (QoS)

� …

� Part III: Current Research

22A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

2

3A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

What is Cloud Computing?

44A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

“Cloud Computing refers to both the applications delivered as

services over the Internet and the hardware and systems

software in the datacenters that provide those services.”

Fox, Armando, et al. "Above the clouds: A Berkeley view of cloud computing." Dept. Electrical Eng. and Comput. Sciences,

University of California, Berkeley, Rep. UCB/EECS 28 (2009).

“A large-scale distributed computing paradigm (…) in which a

pool of abstracted, virtualized, dynamically-scalable,

managed computing power, storage, platforms, and services

are delivered on demand (…) over the Internet.”

Foster, Ian, et al. "Cloud computing and grid computing 360-degree compared." Grid Computing Environments Workshop,

GCE'08. Ieee, 2008.

13-11-2013

3

Clouds

Cloud Computing

66A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Image source: “The Future of Cloud Computing”, available at http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf

13-11-2013

4

TYPES

77A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

What Who

On-demand access to any

applicationEnd-user (consume)

Platform upon which

apps/services can be

developed and hosted

Developer (build)

Access to computacional

resources, i.e. CPU, RAM,

Data & Storage

Hosts provider (host)

SaaS(Software as a Service)

PaaS(Platform as a Service)

IaaS(Infrastructure as

a Service)

MODES

88A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Usually owned by an

institution; functionalities

not directly exposed to the

consumer (ex.: eBay )

Mixed employment of private

and public infrastructures, so as

to reduce costs by sharing, but

with desired degree of control

Owner offer their services

to users outside of the

institution (ex.: Amazon,

Google Apps)

Image source: http://www.iland.com

13-11-2013

5

FEATURES

� Elasticity

� Leveraged by self-* provides agility and adaptability to

environment changes

� Implies horizontal and vertical scalabilities

� Reliability and Availability

� Ensures constant operation through redundant resource usage

(ex.: fault tolerance)

� Ability to deal with increasing concurrent access (ex.: load-

balancing)

99A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

BENEFITS

� Quality of Service

� Support and maintenance of specified users requirements to

be met by the services and/or resources (ex.: response time)

� Pay per use

� Services sold as Utility Computing, costs according to the

actual consumption of resources

� Going Green

� Reduce additional costs of energy consumption, but also to

reduce the carbon footprint

1010A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

6

Virtualization Technology in Clouds

� Virtualization is an essential technology in the Cloud

� Provides all the cloud features (e.g. ease of use, flexibility and

adaptability, location independence, etc.)

1111A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Image source: http:// http://blog.cloudpassage.com

12A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

7

Hot Topics in Cloud Research

� Fault tolerance

� Business continuity and service availability

� Energy efficiency

� Optimize energy consumption (ex.: maximize Mflop / Joule)

� Green cloud computing - minimize operational costs but also

reduce the environmental impact

� Quality of Service

� Performance unpredictability (ex.: due to sharing of resources

among co-located VMs)

1313A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Hot Topics in Cloud Research

� Security

� Data security

� Interoperability

� How different clouds cooperate?

� Normalization

� How to guarantee that a user can change the cloud provider?

� Autonomic Computing

1414A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

8

Fault Tolerance

� Dependability of the infrastructure

� Distributed systems are growing in scale and in complexity

� Mean Time Between Failures (MTBF) would be 1.25h on a

petaflop system(1)

1515

(1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and

Distributed Computing 70.4 (2010): 384-393.

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Fault Tolerance

� Proactive fault tolerance

� Intelligent performance monitoring interface (IPMI) for

health inquires (migration starts for threshold violations)

� Ganglia to determine node targets based on load averages

1616

Nagarajan, A., et al. "Proactive fault tolerance for HPC with Xen virtualization." Proc. of the 21st annual international conference on

Supercomputing. ACM, 2007.

Overall architecture

• In proactive FT systems, processes

automatically migrate from

“unhealthy” nodes to healthy ones.

• In reactive schemes, recovery

occurs in response to already

occurred failures.

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

9

Fault Tolerance

� Dynamic allocation of VMs, considering PMs’ reliability

� Based in a failure predictor tool with 75% of average

accuracy

1717

(1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and

Distributed Computing 70.4 (2010): 384-393.

Proposed architecture for reconfigurable

distributed VM

(1) Optimistic Best-Fit (OBFIT) algorithm

- Selects the PM with minimum weighted

available capacity and reliability

(1) Pessimistic Best-Fit (PBFIT) algorithm

- Calculates average capacity Cavg from

reliable PMs

- Selects the unreliable PM p with

capacity Cp such that Cavg + Cp results in

the minimum necessary capacity

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Fault Tolerance

� Dynamic allocation of VMs, considering PMs’ reliability

� System productivity is enhanced by using proposed strategies

� Task completion rate reaches 91.7% with 83.6% utilization of

relatively unreliable nodes

1818

Percentage of completed tasksPercentage of completed jobs

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

10

Hot Topics in Cloud Research

� Fault tolerance

� Business continuity and service availability

� Energy efficiency

� Optimize energy consumption (ex.: maximize Mflop / Joule)

� Green cloud computing - minimize operational costs but also

reduce the environmental impact

� Quality of Service

� Performance unpredictability (ex.: due to sharing of resources

among co-located VMs)

1919A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Energy Efficiency

� Energy consumption concern

� An average datacenter consumes as much energy as 25000

households(1)

� Main part of energy consumption determined by the CPU(2)

� Energy consumption dominates the operational costs

2020

(1) Kaplan, J. , Forrest, W., Kindler, N., “Revolutionizing Data Center Energy Efficiency, “McKinsey & Company, Tech. Rep.

(2) Berl, Andreas, et al. "Energy-efficient cloud computing." The Computer Journal 53.7 (2010): 1045-1051.

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

11

Energy Efficiency

� Consolidation� Minimize the number of active nodes, and powering down inactive

ones

� Dynamic Voltage Frequency Scaling (DVFS)� Modern CPUs can run at different clock frequencies

2121A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Energy Efficiency - Examples

� Entropy system

� Minimize the number of active nodes, and powering down

inactive ones, while maintaining the performance

2222

Reconfiguration loop

• Find a configuration using the

minimum number n of nodes

necessary to host all VMs

� Constraint programming allows

Entropy to find mappings of tasks to

nodes

Hermenier, F., et al. "Entropy: a consolidation manager for clusters." Proc. of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual

execution environments. ACM, 2009.

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

12

Energy Efficiency - Examples

� Entropy system – results

� Reduces consumption of cluster nodes per hour by over 50%

as compared to static allocation

2323

Number of used physical machines Total execution time

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Energy Efficiency - Examples

� DVFS-enabled clusters

� Algorithm minimizes the processor power dissipation by

dynamically scaling down processor frequencies

2424

von Laszewski, G., et al. "Power-aware scheduling of virtual machines in dvfs-enabled clusters." Cluster Computing and Workshops, 2009.

CLUSTER'09. IEEE International Conference on. IEEE, 2009.

1) Minimize the processor supply

voltage by scaling down the processor

frequency.

2) Schedule VMs to PEs with low

voltages and try not to scale PE to high

voltages.

Working scenario

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

13

Energy Efficiency

� DVFS-enabled clusters – results

� Applying DVFS technique to the compute nodes (PEs) reduces

overall power consumption without degrading the VMs

performance beyond unacceptable levels

2525

Performance impact of varying the number

of VMs and operating frequency

DVFS-enabled cluster scheduling simulation

results

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Hot Topics in Cloud Research

� Fault tolerance

� Business continuity and service availability

� Energy efficiency

� Optimize energy consumption (ex.: maximize Mflop / Joule)

� Green cloud computing - minimize operational costs but also

reduce the environmental impact

� Quality of Service

� Performance unpredictability (ex.: due to sharing of resources

among co-located VMs)

2626A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

14

Quality of Service - Examples

� Enforcing SLAs in scientific clouds

� Deadline-driven batch jobs

� Service Level Agreement (SLA)

2727A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Niehorster, O., et al. "Enforcing SLAs in scientific clouds." Cluster Computing (CLUSTER), 2010 IEEE International Conference on. IEEE, 2010.

The fuzzy control system

1) Tests the feasibility of the SLA.

2) If accepted, guarantees its

fulfillment.

� Approach is independent of the

underlying cloud infrastructure and

should deal with performance

fluctuations

Quality of Service - Examples

� Enforcing SLAs in scientific clouds

� Agents autonomously proof the feasibility of the SLA, and

guarantee the fulfillment of the SLA meeting the deadline

� Agents successfully deal with noise in the cloud that occurs

when VMs are co-located

� VM interference due to resource sharing (RAM, I/O, CPU)

2828A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

15

Quality of Service - Examples

� Sandpiper system

� Hotspot detection algorithm, determines when to resize or

migrate VMs

� Hotspot mitigation algorithm, determines what and where to

migrate and how many resources to allocate

2929A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

The Sandpiper architecture

Wood, T., et al. "Sandpiper: Black-box and gray-box resource management for virtual machines." Computer Networks 53.17 (2009): 2923-2938.

• Migrate the VMs in

decreasing order of VSR

• VSR : volume-to-size

ration (size = RAM

footprint; volume = load)

Quality of Service

� Sandpiper system – results

� Sandpiper can resize resources allocated to VMs

� Migrations occur if additional resources are not available

3030A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

A series of migrations resolve hotspots

13-11-2013

16

31A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Approach

� The goal

3232A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Construct power- and failure-aware computing environments, in order

to maximize the rate of completed jobs by their deadlines

Higher Service

Level

Performance

Pure

Performance

13-11-2013

17

Approach

� It is a SLA based approach

� But SLA agreement should consider user compensations if the deadline is

missed

� Virtual-to-physical resources mapping decisions consider both

the power-efficiency, and reliability level of compute nodes

� Dynamic update of virtual-to-physical configurations (CPU usage

and migration)

3333A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Construct power- and failure-aware computing environments, in order

to maximize the rate of completed jobs by their deadlines

Approach

� Leverage virtualization tools

� Xen credit scheduler

� Dynamically update cap parameter

� Stop & copy migration

� Faster VM migrations, preferable for proactive failure management

3434

CPU

CPU% Power

consumption100

0

Incr

ea

sin

g

– Failure – Stop & copy migration – Failure prediction accuracy

VM

VM

VM

VM

VM

VM

timePM3

PM2

PM1

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

18

System Overview

� Cloud architecture� Private cloud

� Homogenous PMs

� Cluster coordinator manages user’

jobs

� VMs are created and destroyed

dynamically

3535

� Users’ jobs� A job is a set of independent tasks

� A task runs in a single VM, which CPU-intensive workload is known

� Number of tasks per job and tasks deadlines are defined by users

Private cloud management architecture

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

System Overview

� Power model

� Capacity-reliability model

3636

Example of power efficiency curve (p1 = 175W, p2 = 75W)

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

19

Performance Analysis

� Minimum Time Task Execution (MTTE) algorithm

3737

� PM i capacity constraints

� Slack time to accomplish task t

� Selects PM i that:

�guarantees minimum processing power required by the VM

�increases power-efficiency

�has higher reliability

� But reserves maximum processing power

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Performance Analysis

� Relaxed Time Task Execution (RTTE) algorithm

3838

Ho

stC

PU

100%

0%

VM

Cap set in Xen credit scheduler

� Unlike MTTE, the RTTE algorithm always reserves

to VM the minimum amount of processing power

necessary to accomplish the task within its deadline

� However, RTTE is work-conserving

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

20

Performance Analysis

� Implementation considerations

� Stabilization to avoid multiple migrations

� Algorithms compared to ours

� Common Best-Fit (CBFIT)

� Selects the PM with the maximum power-efficiency and do not

consider resources reliability

� Optimistic Best-Fit (OBFIT)

� Pessimistic Best-Fit (PBFIT)

3939A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Performance Analysis

� Simulation setup� 50 PMs, each modeled with one CPU core with the performance

equivalent to 800 MFLOPS

� VMs require 128MB to 1024MB RAM

� VMs stop & copy migration overhead depends on RAM size

� 100 synthetic jobs, each being composed in average of 10 CPU-intensive

workload tasks

� Failed PMs stay unavailable during a period modeled by a Lognormal

distribution, and its mean time was set to 20 minutes, varying up to 150

minutes.

� Tasks deadline are set to 10% more than their minimum execution time

� Failures instants follow a Weibull distribution, with shape parameter of 0.8

� MTBF = 200 minutes

4040A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

13-11-2013

21

Performance Analysis

� Metrics� Completion rate of users’ jobs

� Working-Efficiency

4141

Measures the quantity of useful work done (i.e. completed users’ jobs) by the consumed power

A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Performance Analysis

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4242

13-11-2013

22

Performance Analysis

• Google Cloud tracelogso The medium length of a job is 3 minutes, and the majority of jobs run in less

than 15 minutes, despite there are a number of jobs that run longer than 300

minutes

o Tasks length follow a Lognormal distribution

o CPU usage, varying from near 0% to around 25%, follow a Lognormal

distribution

o 3614 synthetic jobs for a total of 10357 tasks

o MTBF = 200 minutes

o Migrations occurring due to proactive failure management only

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4343

Performance Analysis

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4444

13-11-2013

23

Energy Efficiency Improvement

� The goal

� Find out the closest to optimum values to correctly tune the condition

detection mechanism

� Dynamic update of virtual-to-physical configurations (CPU usage and

migration)

4545A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

Mechanism to detect energy optimization opportunities, and

maintaining fault tolerance to the computing environment

– Failure – Stop & copy migration – Failure prediction accuracy

VM

VM

VM

VM

VM

VM

timePM3

PM2

PM1

Consolidation results

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4646

With consolidationWithout consolidation

13-11-2013

24

Consolidation results

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4747

With

consolidation

Without

consolidation

Consolidation results

A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4848

13-11-2013

25

Conclusions

� Cloud computing opens new challenges� Energy efficiency (more Mflop/Joule)

� Dynamic load balancing

� VMs interference modeling due to resource sharing (CPU, CACHE, I/O)

� CPU intensive and Data intensive jobs

� Data locality

� Scalability (distributed control)

� Autonomic Computing

� CERN Cloud infrastructure� MSc dissertation (MIEIC) to study and develop a resource management

algorithm for CERN cloud

4949A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges

5050A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges