capacity management and resource optimization in cloud environments

22
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Session ID: BTOT-TU- 1700/9 Twitter hashtag #HPSWU

Upload: hp-software-solutions

Post on 09-Jun-2015

1.186 views

Category:

Documents


4 download

DESCRIPTION

Techniques for Identifying Optimization Opportunities in a Virtualization Cloud

TRANSCRIPT

Page 1: Capacity Management and Resource Optimization in Cloud Environments

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Session ID: BTOT-TU-1700/9Twitter hashtag #HPSWU

Page 2: Capacity Management and Resource Optimization in Cloud Environments

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Speaker Name: Shawn Islam, Hari KannanDate: Nov 30, 2010Session ID: BTOT-TU-1700/9

Capacity Management and Resource Optimization in Cloud* environmentsTechniques for Identifying Optimization Opportunities in a Virtualized Cloud

* Virtualized cloud environments

Page 3: Capacity Management and Resource Optimization in Cloud Environments

3

A Balancing ActCapacity Management

“A discipline that ensures IT infrastructure is provided at the right time in the right volume at the right price, and ensuring that IT is used in the most efficient manner”

- ITIL definition

Meeting IT Supply with Business Service Demand “Just-in-Time”

Page 4: Capacity Management and Resource Optimization in Cloud Environments

“The growing adoption of virtualization (and related technologies, such as cloud computing), plus changing organizational and process demands, will force a reassessment of traditional capacity planning and related IT planning functions. ...Traditional capacity planning tools use a silo-based approach to analytical modeling to determine capacity. Such an approach will be inadequate because these tools must be able to build an end-to-end model that is based on IT and Business Services...”

Source: Gartner, IT Resource Planning: Going Beyond Capacity Planning, Cameron Haight, Milind Govekar, George J. Weiss , February 2009

Page 5: Capacity Management and Resource Optimization in Cloud Environments

Capacity Management: Before Virtualized Cloud

Typically, each server hosted a single workload Very limited or virtually no impact of one service’s demands on another service

Capacity statically configured for peak workloads Results in low average utilization

Deploying additional capacity time consuming Time is €€€ !!!

Not possible to migrate workload without downtime Results in a more stable, predictable capacity demand

Capacity management was silo-based Lack of holistic view resulted in inefficiencies

Primary focus on core infrastructure elements (CPU, memory, IO) End-User not part of the capacity equation5

Page 6: Capacity Management and Resource Optimization in Cloud Environments

6

Virtualization adoption is spiraling

2008 2009 20120

10

20

30

40

50

60

5.80 10.80

55.00

VM Installed Base (Millions)

Avg VM Density: 8

Avg VM Density: 75

Capacity Management & Planning is the No.1 challenge

facing companies as they virtualize, according to Forrester

“Admins are aiming for 70% or greater physical server resource utilization by 2012”, according to IDC

2008 2009 2010 2011 201205

101520253035404550

1219

2838

48

Percentage of Installed x86 Work-loads Running in a VM

Sources: Gartner, Forrester, IDC

Virtual Machine installed base to grow by 5x in three years

Almost half of all workloads will bedeployed in virtual environments by

2012

Page 7: Capacity Management and Resource Optimization in Cloud Environments

7

CLOUD: The BIG picture

Cloud Storage ComputeResources.

NetworkBandwidth

Power

DB

Java Apps

IT and Helpdes

k

Operations Automatio

n

Operations Monitoring

Capacity Planning

and Optimizatio

n

Page 8: Capacity Management and Resource Optimization in Cloud Environments

Capacity Management in a Virtualized Cloud

Multiple workloads (VMs) per host Results in higher average utilization, but high impact of one service’s

demands on another service

Capacity easily scaled up or down Deploying additional capacity fast and cheap Unprecedented volume and frequency of workload deployment can result in VM Sprawl High need for tracking and reporting capacity and usage metrics

Dynamic workload migration without downtime Can result in unexpected load spikes on server and network High need for headroom-based capacity provisioning

Capacity shared across multiple tenants & Business

Services Need for Capacity Management and Reporting according to Business needs

8

Page 9: Capacity Management and Resource Optimization in Cloud Environments

Workload Placement: Challenges in a Virtualized Cloud

As VM density increases, sizing and workload placement becomes complex -

need to understand the impact of resource sharing

Using individual VM’s peak or average will result in sub-optimal performance

In-house Tools (think “Excel”) don’t scale well, need constant tune-up

Remaining “head-room” capacity for unforeseen spikes in demand unknown

Page 10: Capacity Management and Resource Optimization in Cloud Environments

Why workload placement is important?

10

10 workloads, 20% Average Utilization

Hosted on 2 servers

10 workloads, 80% Peak Utilization

Using Averages leads to under-provisioningUsing Peaks leads to over-provisioning

Hosted on 8 servers

Page 11: Capacity Management and Resource Optimization in Cloud Environments

Why workload placement is important?

– What if we knew that there are 2 types of VMs?• VMs whose CPUs peaks during day and VMs whose CPUs peak at night

11

10 workloads5 VMs peak @ 80% during

day5 VMs peak @ 80% during

night

Hosted on 5 Servers

Using seasonality data leads to optimal provisioning

Page 12: Capacity Management and Resource Optimization in Cloud Environments

Capacity Management – The Complexity–Now, consider multiple dimensions

12

Compare•HypervisorsVMware,Hyper-V

•CPU - AMD, Intel

•H/W models

Analyze•CPU

•Memory

•I/O

•Energy

Con

stra

ints

•B

usin

ess

•Te

ch

nic

al

Excel no longer adequate!

Page 13: Capacity Management and Resource Optimization in Cloud Environments

13

– Clear understanding of Virtual resource dependencies to Application or Business Service: Workload to Application Service mapping, identify shared workloads

– Order application workloads from largest demand to smallest for different time intervals• Sum up the time varying traces for each resource instance across multiple dimensions to get total per-interval

demand

• Peak of sums is typically less than sum of peaks (15 vs. 19)

– Order resource instances from greatest capacity to smallest

– Across all time intervals, simulate by doing a first fit by adding resources

– Repeat this using randomness to provide many different possible initial solutions

– Converge on the best-fit solution

An Algorithm driven approach for Workload Placement

Time

App1 App2 Total

0 10 5 15

1 3 4 7

2 2 9 11

Page 14: Capacity Management and Resource Optimization in Cloud Environments

14

Analytics: What statistical measure to use– Using peak values and using average values have their limitations

• Estimates can lead to over-sizing and under-sizing

– Trend analysis is required in order to arrive at accurate estimates• percentile values (95-percentile) and sustained peaks rather than the absolutes help to

eliminate short-term and infrequent spikes

• Time varying historical trends needs to be analyzed for each resource instance to identify the total average and peak demand of different time intervals

• Future planned growth and forecasting

– Manual sizing is impossible for large number of workloads: need analytics engine to carry out the algorithm

– What if analytics for balancing Resources and Demand• For all time interval satisfy 100% demand vs. less than 100% demand is sufficient for some time intervals

• Quality of Service needs to be considered for determining demand accommodation and head rooms

Page 15: Capacity Management and Resource Optimization in Cloud Environments

Constraints and Headroom setting best practices

– Place apart• VMs running workloads that perform lot of disk IO operations

• VMs that are members of a failover cluster (MSCS, Veritas, etc)

• Two applications must not share same resources

– Place together• In case of VMware, place VMs with similar workloads running same application versions on same host to gain

from TPS*

• Application bound to use a resource group

– Headroom (free CPU cycles, memory) must be sufficient to allow sudden peak demand, and for free migration of VMs across hosts in a cluster – aim for 70% utilization rates• Provide headroom for network usage too

• Factor in VM migrations also (Ensure that there is sufficient free capacity in the cluster available for free migrations)

• After failover utilization rates at cluster level must not exceed 80%.

• Ensure that there are a minimum of 4 paths from host to storage to allow for any fiber connection failures

15

Page 16: Capacity Management and Resource Optimization in Cloud Environments

16

Analysis & Visibility needed for Decision Planning

Max

90%AVG

Usage - Trend

Inventory – Data Center

Business App to Infra Inventory

Forecast

Forecast30, 60, 90 days

AVGTrend

Capacity

Page 17: Capacity Management and Resource Optimization in Cloud Environments

17

– DRS is reactive capacity Management, pro-active capacity management complements DRS and can reduce the need for live migration

– DRS/HA does not account for future planned growth

– DRS is CPU/memory focused, holistic capacity management adds value

– DRS is silo-based (restricted to a single cluster), datacenter-wide capacity management is a must

Capacity Management in context of DRS/HA

VMware HA

Cluster

VMware DRS

Cluster

Are both really needed? {YES}

Page 18: Capacity Management and Resource Optimization in Cloud Environments

18

– Faults and incidents from event management systems act as triggers for capacity analysis• Right-sizing VMs

• Fitting VMs to hosts with unused capacity

• Cloud-burst – can take services temporarily from cloud providers

− Ensure that security constraints are met at application/user level

– Business Services and IT get linked via the Runtime Service Model• Visibility into Service Performance

• Easy to determine capacity needs and assign resources for new initiatives

Capacity Management converged with Service and Operations Bridge

Page 19: Capacity Management and Resource Optimization in Cloud Environments

ReviewLet’s summarize

Page 20: Capacity Management and Resource Optimization in Cloud Environments

Cloud Capacity Management: Key Requirements

20

VISUALIZE

What you have, What you use, What can fit, What can be

improved

OPTIMIZE

Multi-Dimensional, Time-Varying Analysis and

Workload Placement

recommendation•Support Heterogeneous Environments•Scalable across 1000’s of nodes•Easy integration with existing Data Collectors•Continuous, Real-Time Monitoring and Analysis•Visibility into Service Performance•Correlate Infrastructure Performance with Business Service Performance

FORECAST AND PLAN

What-if Scenarios, Trend

Analysis

Page 21: Capacity Management and Resource Optimization in Cloud Environments

Capacity Management: Key Questions• How much Capacity do I have and how much do I use?

• Across datacenter, by Business Service, by cluster• Across Physical, VMware, Hyper-V resources

• Does my infrastructure keep up with end-user response?• Do I have idle and powered-off VMs resulting in a VM sprawl?• Which servers are under or over-configured?

• How much room for growth do I have?• Across datacenter, by Business Service, by cluster

• When will I run out of capacity?• What-if I add or remove workloads?• What-if I add/modify/remove servers, memory, or CPU?• Should I buy server model A or model B?• What if Business grows by 5% each quarter?• I want to increase my response time, how much more resources

do I need?

Forecast and Plan

• How do I place my VMs optimally in a server farm?• Are my VMs sized correctly?• What is the right VM size for my physical server?• Do I have enough headroom for unexpected spikes?• Will my new workload fit in my existing cluster?

Optimize

Visualize

21

Page 22: Capacity Management and Resource Optimization in Cloud Environments

Continue the conversation with your peers at the HP Software Community hp.com/go/swcommunity