a view of cloud computing: concepts and challengeseol/ssiim/1516/seminars/ssim_cloud...13-11-2013 7...
TRANSCRIPT
13-11-2013
1
A View of Cloud Computing:
Concepts and Challenges
Jorge G. Barbosa
Universidade do Porto,
Faculdade de Engenharia, LIACC
Porto, Portugal
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges FEUP, 2013FEUP, 2013
Outline
� Part I: Basic Concepts
� Introduction and Principals Overview
� Part II: Challenges
� Fault Tolerance
� Energy optimization
� Quality of Service (QoS)
� …
� Part III: Current Research
22A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
2
3A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
What is Cloud Computing?
44A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
“Cloud Computing refers to both the applications delivered as
services over the Internet and the hardware and systems
software in the datacenters that provide those services.”
Fox, Armando, et al. "Above the clouds: A Berkeley view of cloud computing." Dept. Electrical Eng. and Comput. Sciences,
University of California, Berkeley, Rep. UCB/EECS 28 (2009).
“A large-scale distributed computing paradigm (…) in which a
pool of abstracted, virtualized, dynamically-scalable,
managed computing power, storage, platforms, and services
are delivered on demand (…) over the Internet.”
Foster, Ian, et al. "Cloud computing and grid computing 360-degree compared." Grid Computing Environments Workshop,
GCE'08. Ieee, 2008.
13-11-2013
3
Clouds
Cloud Computing
66A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Image source: “The Future of Cloud Computing”, available at http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf
13-11-2013
4
TYPES
77A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
What Who
On-demand access to any
applicationEnd-user (consume)
Platform upon which
apps/services can be
developed and hosted
Developer (build)
Access to computacional
resources, i.e. CPU, RAM,
Data & Storage
Hosts provider (host)
SaaS(Software as a Service)
PaaS(Platform as a Service)
IaaS(Infrastructure as
a Service)
MODES
88A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Usually owned by an
institution; functionalities
not directly exposed to the
consumer (ex.: eBay )
Mixed employment of private
and public infrastructures, so as
to reduce costs by sharing, but
with desired degree of control
Owner offer their services
to users outside of the
institution (ex.: Amazon,
Google Apps)
Image source: http://www.iland.com
13-11-2013
5
FEATURES
� Elasticity
� Leveraged by self-* provides agility and adaptability to
environment changes
� Implies horizontal and vertical scalabilities
� Reliability and Availability
� Ensures constant operation through redundant resource usage
(ex.: fault tolerance)
� Ability to deal with increasing concurrent access (ex.: load-
balancing)
99A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
BENEFITS
� Quality of Service
� Support and maintenance of specified users requirements to
be met by the services and/or resources (ex.: response time)
� Pay per use
� Services sold as Utility Computing, costs according to the
actual consumption of resources
� Going Green
� Reduce additional costs of energy consumption, but also to
reduce the carbon footprint
1010A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
6
Virtualization Technology in Clouds
� Virtualization is an essential technology in the Cloud
� Provides all the cloud features (e.g. ease of use, flexibility and
adaptability, location independence, etc.)
1111A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Image source: http:// http://blog.cloudpassage.com
12A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
7
Hot Topics in Cloud Research
� Fault tolerance
� Business continuity and service availability
� Energy efficiency
� Optimize energy consumption (ex.: maximize Mflop / Joule)
� Green cloud computing - minimize operational costs but also
reduce the environmental impact
� Quality of Service
� Performance unpredictability (ex.: due to sharing of resources
among co-located VMs)
1313A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Hot Topics in Cloud Research
� Security
� Data security
� Interoperability
� How different clouds cooperate?
� Normalization
� How to guarantee that a user can change the cloud provider?
� Autonomic Computing
1414A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
8
Fault Tolerance
� Dependability of the infrastructure
� Distributed systems are growing in scale and in complexity
� Mean Time Between Failures (MTBF) would be 1.25h on a
petaflop system(1)
1515
(1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and
Distributed Computing 70.4 (2010): 384-393.
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Fault Tolerance
� Proactive fault tolerance
� Intelligent performance monitoring interface (IPMI) for
health inquires (migration starts for threshold violations)
� Ganglia to determine node targets based on load averages
1616
Nagarajan, A., et al. "Proactive fault tolerance for HPC with Xen virtualization." Proc. of the 21st annual international conference on
Supercomputing. ACM, 2007.
Overall architecture
• In proactive FT systems, processes
automatically migrate from
“unhealthy” nodes to healthy ones.
• In reactive schemes, recovery
occurs in response to already
occurred failures.
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
9
Fault Tolerance
� Dynamic allocation of VMs, considering PMs’ reliability
� Based in a failure predictor tool with 75% of average
accuracy
1717
(1) Fu, S. "Failure-aware resource management for high-availability computing clusters with distributed virtual machines." Journal of Parallel and
Distributed Computing 70.4 (2010): 384-393.
Proposed architecture for reconfigurable
distributed VM
(1) Optimistic Best-Fit (OBFIT) algorithm
- Selects the PM with minimum weighted
available capacity and reliability
(1) Pessimistic Best-Fit (PBFIT) algorithm
- Calculates average capacity Cavg from
reliable PMs
- Selects the unreliable PM p with
capacity Cp such that Cavg + Cp results in
the minimum necessary capacity
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Fault Tolerance
� Dynamic allocation of VMs, considering PMs’ reliability
� System productivity is enhanced by using proposed strategies
� Task completion rate reaches 91.7% with 83.6% utilization of
relatively unreliable nodes
1818
Percentage of completed tasksPercentage of completed jobs
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
10
Hot Topics in Cloud Research
� Fault tolerance
� Business continuity and service availability
� Energy efficiency
� Optimize energy consumption (ex.: maximize Mflop / Joule)
� Green cloud computing - minimize operational costs but also
reduce the environmental impact
� Quality of Service
� Performance unpredictability (ex.: due to sharing of resources
among co-located VMs)
1919A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Energy Efficiency
� Energy consumption concern
� An average datacenter consumes as much energy as 25000
households(1)
� Main part of energy consumption determined by the CPU(2)
� Energy consumption dominates the operational costs
2020
(1) Kaplan, J. , Forrest, W., Kindler, N., “Revolutionizing Data Center Energy Efficiency, “McKinsey & Company, Tech. Rep.
(2) Berl, Andreas, et al. "Energy-efficient cloud computing." The Computer Journal 53.7 (2010): 1045-1051.
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
11
Energy Efficiency
� Consolidation� Minimize the number of active nodes, and powering down inactive
ones
� Dynamic Voltage Frequency Scaling (DVFS)� Modern CPUs can run at different clock frequencies
2121A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Energy Efficiency - Examples
� Entropy system
� Minimize the number of active nodes, and powering down
inactive ones, while maintaining the performance
2222
Reconfiguration loop
• Find a configuration using the
minimum number n of nodes
necessary to host all VMs
� Constraint programming allows
Entropy to find mappings of tasks to
nodes
Hermenier, F., et al. "Entropy: a consolidation manager for clusters." Proc. of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual
execution environments. ACM, 2009.
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
12
Energy Efficiency - Examples
� Entropy system – results
� Reduces consumption of cluster nodes per hour by over 50%
as compared to static allocation
2323
Number of used physical machines Total execution time
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Energy Efficiency - Examples
� DVFS-enabled clusters
� Algorithm minimizes the processor power dissipation by
dynamically scaling down processor frequencies
2424
von Laszewski, G., et al. "Power-aware scheduling of virtual machines in dvfs-enabled clusters." Cluster Computing and Workshops, 2009.
CLUSTER'09. IEEE International Conference on. IEEE, 2009.
1) Minimize the processor supply
voltage by scaling down the processor
frequency.
2) Schedule VMs to PEs with low
voltages and try not to scale PE to high
voltages.
Working scenario
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
13
Energy Efficiency
� DVFS-enabled clusters – results
� Applying DVFS technique to the compute nodes (PEs) reduces
overall power consumption without degrading the VMs
performance beyond unacceptable levels
2525
Performance impact of varying the number
of VMs and operating frequency
DVFS-enabled cluster scheduling simulation
results
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Hot Topics in Cloud Research
� Fault tolerance
� Business continuity and service availability
� Energy efficiency
� Optimize energy consumption (ex.: maximize Mflop / Joule)
� Green cloud computing - minimize operational costs but also
reduce the environmental impact
� Quality of Service
� Performance unpredictability (ex.: due to sharing of resources
among co-located VMs)
2626A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
14
Quality of Service - Examples
� Enforcing SLAs in scientific clouds
� Deadline-driven batch jobs
� Service Level Agreement (SLA)
2727A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Niehorster, O., et al. "Enforcing SLAs in scientific clouds." Cluster Computing (CLUSTER), 2010 IEEE International Conference on. IEEE, 2010.
The fuzzy control system
1) Tests the feasibility of the SLA.
2) If accepted, guarantees its
fulfillment.
� Approach is independent of the
underlying cloud infrastructure and
should deal with performance
fluctuations
Quality of Service - Examples
� Enforcing SLAs in scientific clouds
� Agents autonomously proof the feasibility of the SLA, and
guarantee the fulfillment of the SLA meeting the deadline
� Agents successfully deal with noise in the cloud that occurs
when VMs are co-located
� VM interference due to resource sharing (RAM, I/O, CPU)
2828A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
15
Quality of Service - Examples
� Sandpiper system
� Hotspot detection algorithm, determines when to resize or
migrate VMs
� Hotspot mitigation algorithm, determines what and where to
migrate and how many resources to allocate
2929A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
The Sandpiper architecture
Wood, T., et al. "Sandpiper: Black-box and gray-box resource management for virtual machines." Computer Networks 53.17 (2009): 2923-2938.
• Migrate the VMs in
decreasing order of VSR
• VSR : volume-to-size
ration (size = RAM
footprint; volume = load)
Quality of Service
� Sandpiper system – results
� Sandpiper can resize resources allocated to VMs
� Migrations occur if additional resources are not available
3030A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
A series of migrations resolve hotspots
13-11-2013
16
31A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Approach
� The goal
3232A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Construct power- and failure-aware computing environments, in order
to maximize the rate of completed jobs by their deadlines
Higher Service
Level
Performance
Pure
Performance
13-11-2013
17
Approach
� It is a SLA based approach
� But SLA agreement should consider user compensations if the deadline is
missed
� Virtual-to-physical resources mapping decisions consider both
the power-efficiency, and reliability level of compute nodes
� Dynamic update of virtual-to-physical configurations (CPU usage
and migration)
3333A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Construct power- and failure-aware computing environments, in order
to maximize the rate of completed jobs by their deadlines
Approach
� Leverage virtualization tools
� Xen credit scheduler
� Dynamically update cap parameter
� Stop & copy migration
� Faster VM migrations, preferable for proactive failure management
3434
CPU
CPU% Power
consumption100
0
Incr
ea
sin
g
– Failure – Stop & copy migration – Failure prediction accuracy
VM
VM
VM
VM
VM
VM
timePM3
PM2
PM1
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
18
System Overview
� Cloud architecture� Private cloud
� Homogenous PMs
� Cluster coordinator manages user’
jobs
� VMs are created and destroyed
dynamically
3535
� Users’ jobs� A job is a set of independent tasks
� A task runs in a single VM, which CPU-intensive workload is known
� Number of tasks per job and tasks deadlines are defined by users
Private cloud management architecture
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
System Overview
� Power model
� Capacity-reliability model
3636
Example of power efficiency curve (p1 = 175W, p2 = 75W)
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
19
Performance Analysis
� Minimum Time Task Execution (MTTE) algorithm
3737
� PM i capacity constraints
� Slack time to accomplish task t
� Selects PM i that:
�guarantees minimum processing power required by the VM
�increases power-efficiency
�has higher reliability
� But reserves maximum processing power
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Performance Analysis
� Relaxed Time Task Execution (RTTE) algorithm
3838
Ho
stC
PU
100%
0%
VM
Cap set in Xen credit scheduler
� Unlike MTTE, the RTTE algorithm always reserves
to VM the minimum amount of processing power
necessary to accomplish the task within its deadline
� However, RTTE is work-conserving
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
20
Performance Analysis
� Implementation considerations
� Stabilization to avoid multiple migrations
� Algorithms compared to ours
� Common Best-Fit (CBFIT)
� Selects the PM with the maximum power-efficiency and do not
consider resources reliability
� Optimistic Best-Fit (OBFIT)
� Pessimistic Best-Fit (PBFIT)
3939A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Performance Analysis
� Simulation setup� 50 PMs, each modeled with one CPU core with the performance
equivalent to 800 MFLOPS
� VMs require 128MB to 1024MB RAM
� VMs stop & copy migration overhead depends on RAM size
� 100 synthetic jobs, each being composed in average of 10 CPU-intensive
workload tasks
� Failed PMs stay unavailable during a period modeled by a Lognormal
distribution, and its mean time was set to 20 minutes, varying up to 150
minutes.
� Tasks deadline are set to 10% more than their minimum execution time
� Failures instants follow a Weibull distribution, with shape parameter of 0.8
� MTBF = 200 minutes
4040A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
13-11-2013
21
Performance Analysis
� Metrics� Completion rate of users’ jobs
� Working-Efficiency
4141
Measures the quantity of useful work done (i.e. completed users’ jobs) by the consumed power
A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Performance Analysis
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4242
13-11-2013
22
Performance Analysis
• Google Cloud tracelogso The medium length of a job is 3 minutes, and the majority of jobs run in less
than 15 minutes, despite there are a number of jobs that run longer than 300
minutes
o Tasks length follow a Lognormal distribution
o CPU usage, varying from near 0% to around 25%, follow a Lognormal
distribution
o 3614 synthetic jobs for a total of 10357 tasks
o MTBF = 200 minutes
o Migrations occurring due to proactive failure management only
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4343
Performance Analysis
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4444
13-11-2013
23
Energy Efficiency Improvement
� The goal
� Find out the closest to optimum values to correctly tune the condition
detection mechanism
� Dynamic update of virtual-to-physical configurations (CPU usage and
migration)
4545A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
Mechanism to detect energy optimization opportunities, and
maintaining fault tolerance to the computing environment
– Failure – Stop & copy migration – Failure prediction accuracy
VM
VM
VM
VM
VM
VM
timePM3
PM2
PM1
Consolidation results
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4646
With consolidationWithout consolidation
13-11-2013
24
Consolidation results
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4747
With
consolidation
Without
consolidation
Consolidation results
A View of Cloud Computing : Concepts and ChallengesA View of Cloud Computing : Concepts and Challenges 4848
13-11-2013
25
Conclusions
� Cloud computing opens new challenges� Energy efficiency (more Mflop/Joule)
� Dynamic load balancing
� VMs interference modeling due to resource sharing (CPU, CACHE, I/O)
� CPU intensive and Data intensive jobs
� Data locality
� Scalability (distributed control)
� Autonomic Computing
� CERN Cloud infrastructure� MSc dissertation (MIEIC) to study and develop a resource management
algorithm for CERN cloud
4949A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges
5050A View of Cloud Computing: Concepts and ChallengesA View of Cloud Computing: Concepts and Challenges