Download - Capacity Planning for Virtualized Datacenters - Sun Network 2003

Sun Network 2003 Presentation

[email protected] Architect - High Performance

Technical ComputingAugust 29, 2003

SunSigmaDFSSProjectP925

Capacity Planning for N1

Project: Capacity Planning for N1ID: P925

2

What is N1?Datacenter Automation

Manage “N” systems as if they were “1” systemSolve the Total Cost of Ownership (TCO) problemsManage all the “fabrics” as one - Network/VLAN, SAN/Zone, power,

consoles, cluster

Heterogenous SupportSolaris, Linux, AIX, HP-UX, Windows, EMC etc…

Layered ProvisioningPlatform/OS, Application, Service

Roadmap Includes Acquisitions2001 Sun internal N1 architectural definition2002 Terraspring platform level virtualization2003 CenterRun Application level provisioning……….


3

Voice of the Customer_ “We want better performance at a lower price”

_ “We want higher utilization”

_ “We don’t want application performance todegrade at times of peak load”

_ “We want more and faster application changes”

_ “How do we do capacity planning with N1?”

Scope…


4

Capacity Planning for N1_ Define

– Project goals, scope and plan, VOC, stakeholders_ Measure

– Definition of Capacity Planning measurements_ Analyze

– Gaps, N1CP Processes Concept Design, Survey_ Design

– Prototype Use Cases_ Verify

– Stakeholder communication and transition plan_ Monitor

– N1 Capacity Planning implementation tracked assubgroup of N1 Strategic Working Group

DEFINE


5

Translate VOC to Measurements“We want better performance at a lower price”

Fast, well tuned and efficient systemsLower Total Cost of OwnershipFlexibility - choice of systems by price, performance, reliability,

scalability, compatibility and feature set“We want higher utilization”

Consistently high utilization of expensive resources“We don’t want application performance to degrade at times of peak load”

Consistent and fast application or service response timesHeadroom needed to handle peak loads

“We want more and faster application changes”Flexible scenario planning, rapid provisioning

Question: “My company already has capacity planning processes andtools” - do you agree or disagree with this statement?

MEASURE


6

N1 as a Constraint and Opportunity_ Centralized control and monitoring_ Highly replicated hardware configurations_ Well defined workload and capacity characterization_ Arrays of load-balanced systems, structured network_ Large SMP nodes, standardized storage layout_ Web services workloads follow an “open system”

queuing model, which is simple to plan against_ Dynamic system domains and virtualized provisioning

allow rapid capacity adjustments and pooled resources_ Primary capacity metrics are CPU power and storage,

secondary metrics (memory, network and thermal) maybe over-provisioned but should be watched

MEASURE


7

Utilization Definition_ Utilization is the proportion of busy time_ Always defined over a time interval_ Sum over devices

(mean load level)

usr+sys CPU for Peak Period

0

10

20

30

40

50

60

70

80

90

100

Time

CP

U %

Utilization

OnCPU Scheduling for Each CPU

0

0.56

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

MicrosecondsO

nC

PU

an

d

Mean

CP

U U

til

MEASURE


8

Headroom Definition_ Headroom is available usable resources

– Total Capacity minus Peak Utilization and Margin– Applies to CPU, RAM, Net, Disk and OS– Depends upon workload mixture– Can be very complex to determine

usr+sys CPU for Peak Period

0

10

20

30

40

50

60

70

80

90

100

Time

CP

U %

Headroom

Utilization

Margin

MEASURE


9

CPU Capacity Measurements_ CPU utilization is defined as busy time divided by

elapsed time for each CPU_ Number of CPUs is dynamic, so capacity at “100%” is

not constant. Use units of “processors” to measure load._ CPU type and speed varies so we need something like

MIPS or M-Values for mixed systems_ CPU utilization should be managed within a range that

safely minimizes headroom to give stable performanceat minimum cost

_ Process level CPU wait time measures the time aprocess spent on the run queue waiting for a free CPU

– This allows response time increase to be observed directly so thatincreased capacity can be provisioned before headroom isexhausted

MEASURE


10

Response Time Definition_ Service time occurs while using a resource_ Queue time waits for access to a resource_ Response Time = Queue time + Service time

Response time curves for random arrival of work from largeunknown user population (e.g. the Internet!)

Response Time Curves

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0 0.5 1 1.5 2 2.5 3 3.5 4

Mean CPU Load Level

Resp

on

se T

ime I

ncr

ease

Fact

or

One CPUTwo CPUsFour CPUs

R = S / (1 - (U/m)m)

MEASURE


11

Response Time CurvesSystems with many CPUs can run at higher utilization

levels, but degrade more rapidly when they finally run outof capacity. Headroom margin should be set according toresponse time margin and CPU count.


0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0 10 20 30 40 50 60 70 80 90 100

Total System Utilization %

Resp

on

se T

ime I

ncr

ease

Fact

or

One CPUTwo CPUsFour CPUsEight CPUs16 CPUs32 CPUs64 CPUs

Headroommargin

R = S / (1 - (U%)m)

MEASURE


12

CPU Scalability DifferencesSMP allows work to migrate between CPUs, “blades” don’t– Single queue of work gives lower response time for user sessions

at high utilization than arrays of uniprocessor “blades”– Headroom margin on array of “blades” is constant as array grows– Two to four CPU systems need much less margin than Uni-CPUs– Measure and calibrate actual response curve per workload


0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

0 0.5 1 1.5 2 2.5 3 3.5 4

CPU Demand Level

Resp

on

se T

ime I

ncr

ease

Fact

or

1 CPU/Blade2 CPU SMP4 CPU SMP2 Blades4 Blades

SMP R = S / (1 - (U/m)m) vs. Blade R = S / (1 - U/m)

MEASURE


13

CPU Measurement System Issues

_ Clock sampled CPU usage– Poor clock resolution at 10ms (optionally 1 ms)– Biased sample since clock schedules jobs– Underestimates more at lower utilization– Creates apparent lack of scalability

_ Microstate measured CPU usage– Measure state changes directly - “microstates”– Per-CPU microstate based counters are not available– Use microstates at process based workload level, sum over some or

all processes as needed (can take a while on big systems)– Microstate method simply extends to measuring services and mixed

workloads

MEASURE


14

N1 Capacity Planning CTQs

10

5

Pri

4.099%70-98%of total

CPUsCPU Responsiveness

(SLA)

3.099%30% of

totalCPUsCPU Utilization (TCO)

BudgetSigma

GaugeAcc.

USLLSLUnitsCTQ Name

MEASURE

Both of these Critical To Quality (CTQ) requirements are measured via the CPU loadlevel which can accurately be measured with a Gauge accuracy estimated at 99% and asigma goal based on defect cost. Using sampled CPU accuracy is estimated at 90%.For CPU Utilization a defect is unacceptable Total Cost of Ownership (TCO) andoccurs if the total CPU load drops below the Lower Specification Limit (LSL) of 30%of the total configured for a sample taken during the peak load period.For CPU Responsiveness a defect is overload leading to a Service Level Agreement(SLA) failure and occurs if the total CPU load goes above the Upper SpecificationLimit (USL) which is 70% of the total configured for Uni-processors increasing forlarger CPU counts.


15

Concept Design - N1CP RolesManager

_ Application Architect– Developers– Database Administrators

_ Systems Architect– Systems Administrators– Storage Administrators– Network Administrators

Others?

Question: What roles do you do?

ANALYZE


16

Scenarios - Top Level Functional Breakdown

ProvisionSystem levelApplications






Install N1Datacenter



Over-ProvisionSystem levelApplications

Right-sizeSystem levelApplications

Repeat infrequently

Re-AllocateResources duringLow load times

Repeat on schedule

Grow or borrowCapacity just before

Overload occurs

Repeat as needed

ANALYZE


17

Installation Sizing Scenario

Measurecapacity ofgenericsystems

Setup SANsand storagefor N1

Setupswitchesand VLANsfor N1

Build genericsystemimages

Size overallstorage

Size overallnetwork

Size systemsmix

Choosesystems

Installgenericapplicationservers

Installgenericdatabaseimages

Choose andsizeapplicationsandplatforms

I want an N1readydatacenter

StorageAdmin

NetworkAdmin

SystemsAdmin

SystemsArchitect

DeveloperDatabaseAdmin

ApplicationArchitect

Manager

This scenario indicates the tasks for each role when an N1 datacenter fabric is created usingcurrently available system level provisioning software. The tasks performed by each role in ascenario is called a “use case”. Future versions of N1 will configure services and policiesduring installation. Red arrows show the command flow between the roles.

ANALYZE

Time


18

Over-Provisioning Scenario

Enable useraccess

Configurebackupstrategy

Configureaccess andsecurity

Use N1 GUIto over-provisioninitialsystem

Acceptancetest

Populatedatabase

ProvisionLUNs

ProvisionInternetconnection

Buildreplicablesystemimages

Defineoperationspolicies

Configureapp server

Configuredatabase

Storagesizing

Networksizing

Systemsselection &versions

Use theseplatforms

App serverversionsand sizing

Databaseversionsand sizing

Use theseapps

Provide anonlineservice

StorageAdmin

NetworkAdmin

SystemsAdmin

SystemsArchitect



Manager

ANALYZE

This gives an indication of the tasks performed by each role as a new application isprovisioned using the capabilities of todays N1 products. The initial goal is to over-provisionthe capacity for initial bring-up of the application then later right-size it as its actual usagepattern becomes better understood. In future releases more and more of this activity will beautomated, and more of the work will move to become pre-work that is related to setting upthe overall N1 datacenter infrastructure.

Time


19

Rightsizing Scenario

Reduceheadroomfor underutilizedstorage

Reduceheadroomfor underutilizedbandwidth

Reduceheadroomfor underutilizedsystems

Reduceheadroomfor underutilizeddatabase

Increaseheadroomforbottleneck




Monitorstorageheadroom

Monitor WAN/ Internetheadroom

Monitor CPU,Networkandmemory

Monitordatabaseheadroom(memoryand tables)

Businesslevel andtrend plan

StorageAdmin

NetworkAdmin

SystemsAdmin

SystemsArchitect



Manager

Rightsizing adjusts the headroom for each component of the system to make sure that theusage level falls inside the specification limits. Rightsizing can be performed during anoffline maintenance window but all the technologies exist to adjust domain size for tier 3systems, and adjust the number of tier 1 and tier 2 systems dynamically.

ANALYZE

Time


20

Re-Allocation Scenario

Bringresourcesback beforepeak loadtime

Moveresourcesto Gridafter peakload time

Determinetiming anddepth ofcapacity tore-allocate

Define batchmechanism

Build orconfigurebatchcapableapplications

Define batchcapableapplications

Batchworkloadcapacityneeded

StorageAdmin

NetworkAdmin

SystemsAdmin

SystemsArchitect



Manager

Load levels vary during the day and the week. Regular times of low utilization can haveother work performed - e.g. overnight batch jobs. Batch workloads that cannot run on thesame systems due to configuration or security issues can run on systems (or Grids) that areprovisioned each night using spare capacity from other systems.

ANALYZE

Time


21

Overload Scenario

Provision extracapacitybefore it isneeded

Monitordeviationsabove normalload level

Negotiatevictim tostealcapacityfrom

Determinenormal loadcurve for timeof day andday of week

Higherutilizationneeded toreduce costof service

StorageAdmin

NetworkAdmin

SystemsAdmin

SystemsArchitect



Manager

Load levels vary during the day and the week in a fairly consistent and predictable manner. Sizingfor the normal load level allows high utilization levels. Higher load levels can be handled as anexception by watching for abnormally high levels before the load peaks and borrowing capacityfrom lower priority applications such as development environments.

Question: “Are dynamic capacity adjustments a mature and reliable technology?”

ANALYZE

Time


22

Rightsizing Scenario_ Detailed Design Concept via an Example

_ Large scale Internet workload– Fairly predictable load shape– Peaks every evening (use peak hours)– Grows every week

_ Key CTQs– Performance during peak hour– Cost of maintaining performance level– Risk of downtime

_ Tier 3 backend database server– Primary bottleneck, over-provisioned elsewhere– Highest cost of CPU headroom (E10K/F15K class)– Initially 56 CPUs in domain, average 30 CPUs load

ANALYZE


23

CPU Load LevelMonitor for days or weeks to establish baseline and time of

peak load, then track that timeslot dailyCPU load (units are CPUs, 56 configured) for a busy day:

Summed CPU Utilization

0

10

20

30

40

50

0:0

0:0

0

0:5

8:0

0

1:5

6:0

0

2:5

4:0

0

3:5

2:0

0

4:5

0:0

0

5:4

8:0

0

6:4

6:0

0

7:4

4:0

0

8:4

2:0

0

9:4

0:0

0

10:3

8:0

0

11:3

6:0

0

12:3

4:0

0

13:3

2:0

0

14:3

0:0

0

15:2

8:0

0

16:2

6:0

0

17:2

4:0

0

18:2

2:0

0

19:2

0:0

0

20:1

8:0

1

21:1

6:0

0

22:1

4:0

0

23:1

2:0

0

Time of Day

CP

U U

tili

zati

on

Level

Peak2 Hrs

ANALYZE


24

Utilization DistributionCapability plot for peak time shows system is less than half

utilized about 25% of the time, too much headroom. Defectrate corresponds to Sigma level of 2.18.

CPU Demand Level

ANALYZE


25

Increase UtilizationReduce system to 40 CPUs, assume linear increase in utilization -

predicted sigma = 5.2Over-simplified - headroom margin and non-linearities not included

in the plan. So add a little extra headroom to compensate

CPU Demand Level

ANALYZE


26

Headroom Tool Prototype_ Solaris specific prototype

– Rapid prototype using SE Toolkit from http://www.setoolkit.com– Shows component level headroom vs. utilization goal– Automatic margin calculation based on CPU count– Samples every few minutes, reports every 30-60 minutes– Microstate based, sums over all processes– Headroom predictor uses mean plus two standard deviations– Text based, logs data to a daily file, 3.5 sigma headroomCode p.=processor, r.=ram, n.=network, d.=disk, .st=status, .cf=configured,

.ll=min lsl, .ul=limit usl, .ld=mean load, .h%=headroom, .sd=std deviation,

.tco=TCO defect rate, .sla=SLA defect rate, .tK=throughput K, .rm=responsetime in milliseconds, .rp=response time proportional increase

time pll pul pcf pst ptco psla pld psd ph% ptK prm prp17:36:04 3.6 11.6 12 Green 0.00 0.00 5.26 0.28 50 15.8 1.05 1.0818:06:04 3.6 11.6 12 Green 0.00 0.00 4.90 0.38 51 13.9 1.01 1.0618:36:04 3.6 11.6 12 Blue 0.40 0.00 4.55 2.19 23 13.0 0.93 1.0919:06:03 3.6 11.6 12 Blue 1.00 0.00 3.02 0.17 71 12.7 0.86 1.0519:36:03 3.6 11.6 12 Blue 0.93 0.00 2.82 0.53 67 12.0 0.67 1.04

DESIGN

Samples taken everytwo minutes andreported every 30minutes

12 CPUs configured,Lower limit 30% = 3.6,Upper limit based on CPUcount at 11.6

Status is based on measureddefect proportion of time thatload level is below pll=TCO orabove pul=SLA limits

Mean load level andstandard deviation arecompared to the upper limitto calculate headroom.

CPU Throughput is based onvoluntary context switches,prm is very short, but prpdefines a response time curve


27

Headroom Calculations

DESIGN

Set configured total to number of processors onlineconf = sysconf(_SC_NPROCESSORS_ONLN);

Set lower spec limit to 30% for TOC failureslsl = conf * 0.3;

Use response time goal of 3 times baseline on curve todetermine margin for maximum load level

rpgoal = 3.0;

Calculate max load level from theoretical response time curve/* rp = R/S, rp = 1/(1-(U^m)) so U = exp(log((rp-1)/rp)/ m)) */usl = conf * exp(log((rpgoal-1.0)/rpgoal)/conf);

Calculate headroom % from mean plus two standarddeviations versus upper spec limit

headp = 100.0 * (1.0 - (mean + 2.0*sd) / usl);Calculate Sigma Zsttco_sigma = 1.5 + (mean - lsl) / sd);sla_sigma = 1.5 + (usl - mean) / sd);


28

Design OptimizationCompare the “traditional” approach with the new design

Run the headroom tool on a big and busy server, collect data and show how a simplistic approachcompares with the method described in this project

SunRay timesharing server monitored for several days. System is loaded to the limit at peak times,but idle out of hours, so focus on a scheduled capacity reallocation scenario.

Simplistic “Traditional” ApproachCollect data using vmstat, sar, SunMC or 3rd party tools

Plot CPU % busy - as shown on next slide

There is spare capacity, but no indication of how many CPUs are unused

Need extra information that this is a 12-CPU system

N1CP ApproachCollect data using headroom prototype

Plot CPU load level in CPU units, no need to guess or replot data

Calculate margin, headroom and sigma levels

Plan capacity reallocation and recalculate margin, headroom and sigma levels

DESIGN


29

Simplistic View

DESIGN

CPU Utilization Monday-Thursday

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0:30

:05

3:30

:05

6:30

:05

9:30

:09

12:3

0:11

15:3

0:21

18:3

0:07

21:3

0:06

0:30

:06

3:30

:05

6:30

:06

9:30

:12

12:3

3:16

15:3

3:16

18:3

3:08

21:3

3:07

0:33

:06

3:33

:07

6:33

:07

9:33

:14

15:0

6:17

18:0

6:07

21:0

6:06

0:06

:06

3:06

:06

6:06

:06

9:06

:10

12:0

6:10

15:0

6:13

18:0

6:07

21:0

6:07

Time of Day

CPU

%bu

sy

There is no indication of how many CPUs are in use, util = 59% overall


30

DESIGN

N1CP View - CPU CountsThere are 12 CPUs, 6 to 8 are free overnight, system overloads at peak times

Mean+2sd Load vs Configured and Upper Limit

0

2

4

6

8

10

12

14

0:30

:05

3:00

:05

5:30

:05

8:00

:06

10:3

0:16

13:0

0:14

15:3

0:21

18:0

0:08

20:3

0:06

23:0

0:06

1:30

:06

4:00

:06

6:30

:06

9:00

:09

11:3

0:15

14:0

3:13

16:3

3:10

19:0

3:07

21:3

3:07

0:03

:07

2:33

:06

5:03

:07

7:33

:07

12:3

6:12

15:0

6:17

17:3

6:07

20:0

6:06

22:3

6:06

1:06

:06

3:36

:06

6:06

:06

8:36

:08

11:0

6:12

13:3

6:12

16:0

6:12

18:3

6:07

21:0

6:07

23:3

6:06

Time of Day

CPU

Cou

nt

pcf pul pmd+2psd

12.00Mean capacity

34%Mean headroom

59%Mean Util

7.03Mean CPU Load

2.5 Zst538SLA

-1.5 Zst110215TCO

Min SigmaDPMOSummary


31

N1CP - Response Curve

DESIGN

System is close to overload, this timeshared workload has a flatter curvethan internet workloads (closed rather than open queuing model)

Response Time vs Load Level

0

0.5

1

1.5

2

2.5

0 2 4 6 8 10 12

CPU Count

Resp

onse

Inc

reas

e


32

Simplistic - CPUs reallocated

DESIGN

There is no indication of how many CPUs are in useCPU Utilization with Capacity Optimization

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0:30

:05

3:30

:05

6:30

:05

9:30

:09

12:3

0:11

15:3

0:21

18:3

0:07

21:3

0:06

0:30

:06

3:30

:05

6:30

:06

9:30

:12

12:3

3:16

15:3

3:16

18:3

3:08

21:3

3:07

0:33

:06

3:33

:07

6:33

:07

9:33

:14

15:0

6:17

18:0

6:07

21:0

6:06

0:06

:06

3:06

:06

6:06

:06

9:06

:10

12:0

6:10

15:0

6:13

18:0

6:07

21:0

6:07

Time of Day

CPU

%bu

sy


33

CPU mean+2sd Load vs Config and Upper Limit

0

2

4

6

8

10

12

14

0:30

:05

3:30

:05

6:30

:05

9:30

:09

12:3

0:11

15:3

0:21

18:3

0:07

21:3

0:06

0:30

:06

3:30

:05

6:30

:06

9:30

:12

12:3

3:16

15:3

3:16

18:3

3:08

21:3

3:07

0:33

:06

3:33

:07

6:33

:07

9:33

:14

15:0

6:17

18:0

6:07

21:0

6:06

0:06

:06

3:06

:06

6:06

:06

9:06

:10

12:0

6:10

15:0

6:13

18:0

6:07

21:0

6:07

Time of Day

CP

U C

ou

nt

pcf pul pmd+2psd

DESIGN

N1CP View - Dynamic!Vary the CPU count and times daily, and borrow extra for the peak load

9.52Mean capacity

16%Mean headroom

74%Mean Util

7.03Mean CPU load

6.3s

3.2s

3.2s

3.5s

5.2s

3.2s

5.7s

4.3s

3.6s

3.2 ZstSLA

2.0 ZstTCO

Min SigmaPredicted


34

Summary

DESIGN

Performance ImpactSLA Sigma levels improve from minimum of 2.5 Zst to 3.2 ZstImprovement of 0.7 Sigma by allowing for extra peak loadSimplistic methods do not allow quality of service prediction

Cost ImpactTCO Sigma levels improve from minimum of -1.5 Zst to 2.0 ZstImprovement of 3.5 Sigma by reducing capacity from 12 to 9.5

Observability ImpactHeadroom tool prototype generates all required statisticsSigma level is simply calculated, or headroom tool could print itSimplistic methods do not show what is going on

Complexity ImpactDynamic reconfiguration must be enabledOne reconfiguration each morning and each evening

Applicability (Assertions, out of scope for this project!)CPU based example can be applied to blades, RAM, disk, net, thermalMethod can be extended from platform level to services


35

N1 Console Screenshots

VERIFY


36

Capacity for SaleUses for Spare Capacity

Carefully schedule batch work and backupsRemotely support global timezonesRun engineering dept. simulation jobs

Grid Oriented SolutionsProject Grid - departmental cluster (Sun Grid Engine)Enterprise Grid - collection of clusters forming a general

purpose Grid service (Sun Grid Engine Enterprise Edition)The Global Grid - Internet level - GT2.2, OGSA/OGSI/GT3Provision an Enterprise Grid service using N1Join The Global Grid and sell or share capacity

GRID


37

Relationships: N1 and GridN1 is about provisioning things you own, Grid is about access to things you don’t own

UtilityComputing

Grid ServicesWeb Services

Things youborrow or

lease

UtilityComputingN1

Things youown andcontrol

BusinessModelInfrastructure

GRID


38

Tier 0WebFrontEnd

Tier 3DatabaseStorage

Tier 2App

Servers

Tier 1Web

Servers

SunGrid

EngineEnter-Prise

Edition

Cluster GridCompute and

Storage Resources

N1 Virtualized Datacenter

FreePool

UnusedResources

Repair and Replace

CapacityOn

Demand

RetireObsoleteCapacity

PurchaseCapacity

PurchaseC.O.D.

Web User /Web Services

Grid User /Grid Services

UtilityComputingCapacityRequests

Capacity Flows in a Grid Enabled N1 Datacenter

GRID


39

IT market segments by “need to share”

Everythingincluding other

users

Everything in TheGlobal Gridcommunity

Local systemsand Internet

Local systems,Project Grids

What isvisible

Networkbandwidth.Know-how

CPU cycles.Organizational

issues

Storage.Organizational,

legal,contractual

issues

CPU cycles,Latency.Nationalsecurity

Primaryconstraints

P2P apps,SETI, Kazaa,

Limewire,People!

Grid, VPN,encryption,

firewalls

N1, Serverdomains, VLANand SAN Zone

partitioning

Nothing,physical

separationrequired

What istrusted

EverythingOperatingSystemHardwareNothingWhat can be

shared

Consumerusers

Technicalgeeks

Commercialsuits

Defensespooks

GRID

[email protected] Sun SigmaDFSSProjectP925

Questions?

Capacity Planning for N1

Download - Capacity Planning for Virtualized Datacenters - Sun Network 2003

Top Related