Sun Network 2003 Presentation
[email protected] Architect - High Performance
Technical ComputingAugust 29, 2003
SunSigmaDFSSProjectP925
Capacity Planning for N1
Project: Capacity Planning for N1ID: P925
2
What is N1?Datacenter Automation
Manage “N” systems as if they were “1” systemSolve the Total Cost of Ownership (TCO) problemsManage all the “fabrics” as one - Network/VLAN, SAN/Zone, power,
consoles, cluster
Heterogenous SupportSolaris, Linux, AIX, HP-UX, Windows, EMC etc…
Layered ProvisioningPlatform/OS, Application, Service
Roadmap Includes Acquisitions2001 Sun internal N1 architectural definition2002 Terraspring platform level virtualization2003 CenterRun Application level provisioning……….
Project: Capacity Planning for N1ID: P925
3
Voice of the Customer_ “We want better performance at a lower price”
_ “We want higher utilization”
_ “We don’t want application performance todegrade at times of peak load”
_ “We want more and faster application changes”
_ “How do we do capacity planning with N1?”
Scope…
Project: Capacity Planning for N1ID: P925
4
Capacity Planning for N1_ Define
– Project goals, scope and plan, VOC, stakeholders_ Measure
– Definition of Capacity Planning measurements_ Analyze
– Gaps, N1CP Processes Concept Design, Survey_ Design
– Prototype Use Cases_ Verify
– Stakeholder communication and transition plan_ Monitor
– N1 Capacity Planning implementation tracked assubgroup of N1 Strategic Working Group
DEFINE
Project: Capacity Planning for N1ID: P925
5
Translate VOC to Measurements“We want better performance at a lower price”
Fast, well tuned and efficient systemsLower Total Cost of OwnershipFlexibility - choice of systems by price, performance, reliability,
scalability, compatibility and feature set“We want higher utilization”
Consistently high utilization of expensive resources“We don’t want application performance to degrade at times of peak load”
Consistent and fast application or service response timesHeadroom needed to handle peak loads
“We want more and faster application changes”Flexible scenario planning, rapid provisioning
Question: “My company already has capacity planning processes andtools” - do you agree or disagree with this statement?
MEASURE
Project: Capacity Planning for N1ID: P925
6
N1 as a Constraint and Opportunity_ Centralized control and monitoring_ Highly replicated hardware configurations_ Well defined workload and capacity characterization_ Arrays of load-balanced systems, structured network_ Large SMP nodes, standardized storage layout_ Web services workloads follow an “open system”
queuing model, which is simple to plan against_ Dynamic system domains and virtualized provisioning
allow rapid capacity adjustments and pooled resources_ Primary capacity metrics are CPU power and storage,
secondary metrics (memory, network and thermal) maybe over-provisioned but should be watched
MEASURE
Project: Capacity Planning for N1ID: P925
7
Utilization Definition_ Utilization is the proportion of busy time_ Always defined over a time interval_ Sum over devices
(mean load level)
usr+sys CPU for Peak Period
0
10
20
30
40
50
60
70
80
90
100
Time
CP
U %
Utilization
OnCPU Scheduling for Each CPU
0
0.56
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
MicrosecondsO
nC
PU
an
d
Mean
CP
U U
til
MEASURE
Project: Capacity Planning for N1ID: P925
8
Headroom Definition_ Headroom is available usable resources
– Total Capacity minus Peak Utilization and Margin– Applies to CPU, RAM, Net, Disk and OS– Depends upon workload mixture– Can be very complex to determine
usr+sys CPU for Peak Period
0
10
20
30
40
50
60
70
80
90
100
Time
CP
U %
Headroom
Utilization
Margin
MEASURE
Project: Capacity Planning for N1ID: P925
9
CPU Capacity Measurements_ CPU utilization is defined as busy time divided by
elapsed time for each CPU_ Number of CPUs is dynamic, so capacity at “100%” is
not constant. Use units of “processors” to measure load._ CPU type and speed varies so we need something like
MIPS or M-Values for mixed systems_ CPU utilization should be managed within a range that
safely minimizes headroom to give stable performanceat minimum cost
_ Process level CPU wait time measures the time aprocess spent on the run queue waiting for a free CPU
– This allows response time increase to be observed directly so thatincreased capacity can be provisioned before headroom isexhausted
MEASURE
Project: Capacity Planning for N1ID: P925
10
Response Time Definition_ Service time occurs while using a resource_ Queue time waits for access to a resource_ Response Time = Queue time + Service time
Response time curves for random arrival of work from largeunknown user population (e.g. the Internet!)
Response Time Curves
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
0 0.5 1 1.5 2 2.5 3 3.5 4
Mean CPU Load Level
Resp
on
se T
ime I
ncr
ease
Fact
or
One CPUTwo CPUsFour CPUs
R = S / (1 - (U/m)m)
MEASURE
Project: Capacity Planning for N1ID: P925
11
Response Time CurvesSystems with many CPUs can run at higher utilization
levels, but degrade more rapidly when they finally run outof capacity. Headroom margin should be set according toresponse time margin and CPU count.
Response Time Curves
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
0 10 20 30 40 50 60 70 80 90 100
Total System Utilization %
Resp
on
se T
ime I
ncr
ease
Fact
or
One CPUTwo CPUsFour CPUsEight CPUs16 CPUs32 CPUs64 CPUs
Headroommargin
R = S / (1 - (U%)m)
MEASURE
Project: Capacity Planning for N1ID: P925
12
CPU Scalability DifferencesSMP allows work to migrate between CPUs, “blades” don’t– Single queue of work gives lower response time for user sessions
at high utilization than arrays of uniprocessor “blades”– Headroom margin on array of “blades” is constant as array grows– Two to four CPU systems need much less margin than Uni-CPUs– Measure and calibrate actual response curve per workload
Response Time Curves
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
0 0.5 1 1.5 2 2.5 3 3.5 4
CPU Demand Level
Resp
on
se T
ime I
ncr
ease
Fact
or
1 CPU/Blade2 CPU SMP4 CPU SMP2 Blades4 Blades
SMP R = S / (1 - (U/m)m) vs. Blade R = S / (1 - U/m)
MEASURE
Project: Capacity Planning for N1ID: P925
13
CPU Measurement System Issues
_ Clock sampled CPU usage– Poor clock resolution at 10ms (optionally 1 ms)– Biased sample since clock schedules jobs– Underestimates more at lower utilization– Creates apparent lack of scalability
_ Microstate measured CPU usage– Measure state changes directly - “microstates”– Per-CPU microstate based counters are not available– Use microstates at process based workload level, sum over some or
all processes as needed (can take a while on big systems)– Microstate method simply extends to measuring services and mixed
workloads
MEASURE
Project: Capacity Planning for N1ID: P925
14
N1 Capacity Planning CTQs
10
5
Pri
4.099%70-98%of total
CPUsCPU Responsiveness
(SLA)
3.099%30% of
totalCPUsCPU Utilization (TCO)
BudgetSigma
GaugeAcc.
USLLSLUnitsCTQ Name
MEASURE
Both of these Critical To Quality (CTQ) requirements are measured via the CPU loadlevel which can accurately be measured with a Gauge accuracy estimated at 99% and asigma goal based on defect cost. Using sampled CPU accuracy is estimated at 90%.For CPU Utilization a defect is unacceptable Total Cost of Ownership (TCO) andoccurs if the total CPU load drops below the Lower Specification Limit (LSL) of 30%of the total configured for a sample taken during the peak load period.For CPU Responsiveness a defect is overload leading to a Service Level Agreement(SLA) failure and occurs if the total CPU load goes above the Upper SpecificationLimit (USL) which is 70% of the total configured for Uni-processors increasing forlarger CPU counts.
Project: Capacity Planning for N1ID: P925
15
Concept Design - N1CP RolesManager
_ Application Architect– Developers– Database Administrators
_ Systems Architect– Systems Administrators– Storage Administrators– Network Administrators
Others?
Question: What roles do you do?
ANALYZE
Project: Capacity Planning for N1ID: P925
16
Scenarios - Top Level Functional Breakdown
ProvisionSystem levelApplications
ProvisionSystem levelApplications
ProvisionSystem levelApplications
ProvisionSystem levelApplications
ProvisionSystem levelApplications
ProvisionSystem levelApplications
Install N1Datacenter
ProvisionSystem levelApplications
ProvisionSystem levelApplications
Over-ProvisionSystem levelApplications
Right-sizeSystem levelApplications
Repeat infrequently
Re-AllocateResources duringLow load times
Repeat on schedule
Grow or borrowCapacity just before
Overload occurs
Repeat as needed
ANALYZE
Project: Capacity Planning for N1ID: P925
17
Installation Sizing Scenario
Measurecapacity ofgenericsystems
Setup SANsand storagefor N1
Setupswitchesand VLANsfor N1
Build genericsystemimages
Size overallstorage
Size overallnetwork
Size systemsmix
Choosesystems
Installgenericapplicationservers
Installgenericdatabaseimages
Choose andsizeapplicationsandplatforms
I want an N1readydatacenter
StorageAdmin
NetworkAdmin
SystemsAdmin
SystemsArchitect
DeveloperDatabaseAdmin
ApplicationArchitect
Manager
This scenario indicates the tasks for each role when an N1 datacenter fabric is created usingcurrently available system level provisioning software. The tasks performed by each role in ascenario is called a “use case”. Future versions of N1 will configure services and policiesduring installation. Red arrows show the command flow between the roles.
ANALYZE
Time
Project: Capacity Planning for N1ID: P925
18
Over-Provisioning Scenario
Enable useraccess
Configurebackupstrategy
Configureaccess andsecurity
Use N1 GUIto over-provisioninitialsystem
Acceptancetest
Populatedatabase
ProvisionLUNs
ProvisionInternetconnection
Buildreplicablesystemimages
Defineoperationspolicies
Configureapp server
Configuredatabase
Storagesizing
Networksizing
Systemsselection &versions
Use theseplatforms
App serverversionsand sizing
Databaseversionsand sizing
Use theseapps
Provide anonlineservice
StorageAdmin
NetworkAdmin
SystemsAdmin
SystemsArchitect
DeveloperDatabaseAdmin
ApplicationArchitect
Manager
ANALYZE
This gives an indication of the tasks performed by each role as a new application isprovisioned using the capabilities of todays N1 products. The initial goal is to over-provisionthe capacity for initial bring-up of the application then later right-size it as its actual usagepattern becomes better understood. In future releases more and more of this activity will beautomated, and more of the work will move to become pre-work that is related to setting upthe overall N1 datacenter infrastructure.
Time
Project: Capacity Planning for N1ID: P925
19
Rightsizing Scenario
Reduceheadroomfor underutilizedstorage
Reduceheadroomfor underutilizedbandwidth
Reduceheadroomfor underutilizedsystems
Reduceheadroomfor underutilizeddatabase
Increaseheadroomforbottleneck
Increaseheadroomforbottleneck
Increaseheadroomforbottleneck
Increaseheadroomforbottleneck
Monitorstorageheadroom
Monitor WAN/ Internetheadroom
Monitor CPU,Networkandmemory
Monitordatabaseheadroom(memoryand tables)
Businesslevel andtrend plan
StorageAdmin
NetworkAdmin
SystemsAdmin
SystemsArchitect
DeveloperDatabaseAdmin
ApplicationArchitect
Manager
Rightsizing adjusts the headroom for each component of the system to make sure that theusage level falls inside the specification limits. Rightsizing can be performed during anoffline maintenance window but all the technologies exist to adjust domain size for tier 3systems, and adjust the number of tier 1 and tier 2 systems dynamically.
ANALYZE
Time
Project: Capacity Planning for N1ID: P925
20
Re-Allocation Scenario
Bringresourcesback beforepeak loadtime
Moveresourcesto Gridafter peakload time
Determinetiming anddepth ofcapacity tore-allocate
Define batchmechanism
Build orconfigurebatchcapableapplications
Define batchcapableapplications
Batchworkloadcapacityneeded
StorageAdmin
NetworkAdmin
SystemsAdmin
SystemsArchitect
DeveloperDatabaseAdmin
ApplicationArchitect
Manager
Load levels vary during the day and the week. Regular times of low utilization can haveother work performed - e.g. overnight batch jobs. Batch workloads that cannot run on thesame systems due to configuration or security issues can run on systems (or Grids) that areprovisioned each night using spare capacity from other systems.
ANALYZE
Time
Project: Capacity Planning for N1ID: P925
21
Overload Scenario
Provision extracapacitybefore it isneeded
Monitordeviationsabove normalload level
Negotiatevictim tostealcapacityfrom
Determinenormal loadcurve for timeof day andday of week
Higherutilizationneeded toreduce costof service
StorageAdmin
NetworkAdmin
SystemsAdmin
SystemsArchitect
DeveloperDatabaseAdmin
ApplicationArchitect
Manager
Load levels vary during the day and the week in a fairly consistent and predictable manner. Sizingfor the normal load level allows high utilization levels. Higher load levels can be handled as anexception by watching for abnormally high levels before the load peaks and borrowing capacityfrom lower priority applications such as development environments.
Question: “Are dynamic capacity adjustments a mature and reliable technology?”
ANALYZE
Time
Project: Capacity Planning for N1ID: P925
22
Rightsizing Scenario_ Detailed Design Concept via an Example
_ Large scale Internet workload– Fairly predictable load shape– Peaks every evening (use peak hours)– Grows every week
_ Key CTQs– Performance during peak hour– Cost of maintaining performance level– Risk of downtime
_ Tier 3 backend database server– Primary bottleneck, over-provisioned elsewhere– Highest cost of CPU headroom (E10K/F15K class)– Initially 56 CPUs in domain, average 30 CPUs load
ANALYZE
Project: Capacity Planning for N1ID: P925
23
CPU Load LevelMonitor for days or weeks to establish baseline and time of
peak load, then track that timeslot dailyCPU load (units are CPUs, 56 configured) for a busy day:
Summed CPU Utilization
0
10
20
30
40
50
0:0
0:0
0
0:5
8:0
0
1:5
6:0
0
2:5
4:0
0
3:5
2:0
0
4:5
0:0
0
5:4
8:0
0
6:4
6:0
0
7:4
4:0
0
8:4
2:0
0
9:4
0:0
0
10:3
8:0
0
11:3
6:0
0
12:3
4:0
0
13:3
2:0
0
14:3
0:0
0
15:2
8:0
0
16:2
6:0
0
17:2
4:0
0
18:2
2:0
0
19:2
0:0
0
20:1
8:0
1
21:1
6:0
0
22:1
4:0
0
23:1
2:0
0
Time of Day
CP
U U
tili
zati
on
Level
Peak2 Hrs
ANALYZE
Project: Capacity Planning for N1ID: P925
24
Utilization DistributionCapability plot for peak time shows system is less than half
utilized about 25% of the time, too much headroom. Defectrate corresponds to Sigma level of 2.18.
CPU Demand Level
ANALYZE
Project: Capacity Planning for N1ID: P925
25
Increase UtilizationReduce system to 40 CPUs, assume linear increase in utilization -
predicted sigma = 5.2Over-simplified - headroom margin and non-linearities not included
in the plan. So add a little extra headroom to compensate
CPU Demand Level
ANALYZE
Project: Capacity Planning for N1ID: P925
26
Headroom Tool Prototype_ Solaris specific prototype
– Rapid prototype using SE Toolkit from http://www.setoolkit.com– Shows component level headroom vs. utilization goal– Automatic margin calculation based on CPU count– Samples every few minutes, reports every 30-60 minutes– Microstate based, sums over all processes– Headroom predictor uses mean plus two standard deviations– Text based, logs data to a daily file, 3.5 sigma headroomCode p.=processor, r.=ram, n.=network, d.=disk, .st=status, .cf=configured,
.ll=min lsl, .ul=limit usl, .ld=mean load, .h%=headroom, .sd=std deviation,
.tco=TCO defect rate, .sla=SLA defect rate, .tK=throughput K, .rm=responsetime in milliseconds, .rp=response time proportional increase
time pll pul pcf pst ptco psla pld psd ph% ptK prm prp17:36:04 3.6 11.6 12 Green 0.00 0.00 5.26 0.28 50 15.8 1.05 1.0818:06:04 3.6 11.6 12 Green 0.00 0.00 4.90 0.38 51 13.9 1.01 1.0618:36:04 3.6 11.6 12 Blue 0.40 0.00 4.55 2.19 23 13.0 0.93 1.0919:06:03 3.6 11.6 12 Blue 1.00 0.00 3.02 0.17 71 12.7 0.86 1.0519:36:03 3.6 11.6 12 Blue 0.93 0.00 2.82 0.53 67 12.0 0.67 1.04
DESIGN
Samples taken everytwo minutes andreported every 30minutes
12 CPUs configured,Lower limit 30% = 3.6,Upper limit based on CPUcount at 11.6
Status is based on measureddefect proportion of time thatload level is below pll=TCO orabove pul=SLA limits
Mean load level andstandard deviation arecompared to the upper limitto calculate headroom.
CPU Throughput is based onvoluntary context switches,prm is very short, but prpdefines a response time curve
Project: Capacity Planning for N1ID: P925
27
Headroom Calculations
DESIGN
Set configured total to number of processors onlineconf = sysconf(_SC_NPROCESSORS_ONLN);
Set lower spec limit to 30% for TOC failureslsl = conf * 0.3;
Use response time goal of 3 times baseline on curve todetermine margin for maximum load level
rpgoal = 3.0;
Calculate max load level from theoretical response time curve/* rp = R/S, rp = 1/(1-(U^m)) so U = exp(log((rp-1)/rp)/ m)) */usl = conf * exp(log((rpgoal-1.0)/rpgoal)/conf);
Calculate headroom % from mean plus two standarddeviations versus upper spec limit
headp = 100.0 * (1.0 - (mean + 2.0*sd) / usl);Calculate Sigma Zsttco_sigma = 1.5 + (mean - lsl) / sd);sla_sigma = 1.5 + (usl - mean) / sd);
Project: Capacity Planning for N1ID: P925
28
Design OptimizationCompare the “traditional” approach with the new design
Run the headroom tool on a big and busy server, collect data and show how a simplistic approachcompares with the method described in this project
SunRay timesharing server monitored for several days. System is loaded to the limit at peak times,but idle out of hours, so focus on a scheduled capacity reallocation scenario.
Simplistic “Traditional” ApproachCollect data using vmstat, sar, SunMC or 3rd party tools
Plot CPU % busy - as shown on next slide
There is spare capacity, but no indication of how many CPUs are unused
Need extra information that this is a 12-CPU system
N1CP ApproachCollect data using headroom prototype
Plot CPU load level in CPU units, no need to guess or replot data
Calculate margin, headroom and sigma levels
Plan capacity reallocation and recalculate margin, headroom and sigma levels
DESIGN
Project: Capacity Planning for N1ID: P925
29
Simplistic View
DESIGN
CPU Utilization Monday-Thursday
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0:30
:05
3:30
:05
6:30
:05
9:30
:09
12:3
0:11
15:3
0:21
18:3
0:07
21:3
0:06
0:30
:06
3:30
:05
6:30
:06
9:30
:12
12:3
3:16
15:3
3:16
18:3
3:08
21:3
3:07
0:33
:06
3:33
:07
6:33
:07
9:33
:14
15:0
6:17
18:0
6:07
21:0
6:06
0:06
:06
3:06
:06
6:06
:06
9:06
:10
12:0
6:10
15:0
6:13
18:0
6:07
21:0
6:07
Time of Day
CPU
%bu
sy
There is no indication of how many CPUs are in use, util = 59% overall
Project: Capacity Planning for N1ID: P925
30
DESIGN
N1CP View - CPU CountsThere are 12 CPUs, 6 to 8 are free overnight, system overloads at peak times
Mean+2sd Load vs Configured and Upper Limit
0
2
4
6
8
10
12
14
0:30
:05
3:00
:05
5:30
:05
8:00
:06
10:3
0:16
13:0
0:14
15:3
0:21
18:0
0:08
20:3
0:06
23:0
0:06
1:30
:06
4:00
:06
6:30
:06
9:00
:09
11:3
0:15
14:0
3:13
16:3
3:10
19:0
3:07
21:3
3:07
0:03
:07
2:33
:06
5:03
:07
7:33
:07
12:3
6:12
15:0
6:17
17:3
6:07
20:0
6:06
22:3
6:06
1:06
:06
3:36
:06
6:06
:06
8:36
:08
11:0
6:12
13:3
6:12
16:0
6:12
18:3
6:07
21:0
6:07
23:3
6:06
Time of Day
CPU
Cou
nt
pcf pul pmd+2psd
12.00Mean capacity
34%Mean headroom
59%Mean Util
7.03Mean CPU Load
2.5 Zst538SLA
-1.5 Zst110215TCO
Min SigmaDPMOSummary
Project: Capacity Planning for N1ID: P925
31
N1CP - Response Curve
DESIGN
System is close to overload, this timeshared workload has a flatter curvethan internet workloads (closed rather than open queuing model)
Response Time vs Load Level
0
0.5
1
1.5
2
2.5
0 2 4 6 8 10 12
CPU Count
Resp
onse
Inc
reas
e
Project: Capacity Planning for N1ID: P925
32
Simplistic - CPUs reallocated
DESIGN
There is no indication of how many CPUs are in useCPU Utilization with Capacity Optimization
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0:30
:05
3:30
:05
6:30
:05
9:30
:09
12:3
0:11
15:3
0:21
18:3
0:07
21:3
0:06
0:30
:06
3:30
:05
6:30
:06
9:30
:12
12:3
3:16
15:3
3:16
18:3
3:08
21:3
3:07
0:33
:06
3:33
:07
6:33
:07
9:33
:14
15:0
6:17
18:0
6:07
21:0
6:06
0:06
:06
3:06
:06
6:06
:06
9:06
:10
12:0
6:10
15:0
6:13
18:0
6:07
21:0
6:07
Time of Day
CPU
%bu
sy
Project: Capacity Planning for N1ID: P925
33
CPU mean+2sd Load vs Config and Upper Limit
0
2
4
6
8
10
12
14
0:30
:05
3:30
:05
6:30
:05
9:30
:09
12:3
0:11
15:3
0:21
18:3
0:07
21:3
0:06
0:30
:06
3:30
:05
6:30
:06
9:30
:12
12:3
3:16
15:3
3:16
18:3
3:08
21:3
3:07
0:33
:06
3:33
:07
6:33
:07
9:33
:14
15:0
6:17
18:0
6:07
21:0
6:06
0:06
:06
3:06
:06
6:06
:06
9:06
:10
12:0
6:10
15:0
6:13
18:0
6:07
21:0
6:07
Time of Day
CP
U C
ou
nt
pcf pul pmd+2psd
DESIGN
N1CP View - Dynamic!Vary the CPU count and times daily, and borrow extra for the peak load
9.52Mean capacity
16%Mean headroom
74%Mean Util
7.03Mean CPU load
6.3s
3.2s
3.2s
3.5s
5.2s
3.2s
5.7s
4.3s
3.6s
3.2 ZstSLA
2.0 ZstTCO
Min SigmaPredicted
Project: Capacity Planning for N1ID: P925
34
Summary
DESIGN
Performance ImpactSLA Sigma levels improve from minimum of 2.5 Zst to 3.2 ZstImprovement of 0.7 Sigma by allowing for extra peak loadSimplistic methods do not allow quality of service prediction
Cost ImpactTCO Sigma levels improve from minimum of -1.5 Zst to 2.0 ZstImprovement of 3.5 Sigma by reducing capacity from 12 to 9.5
Observability ImpactHeadroom tool prototype generates all required statisticsSigma level is simply calculated, or headroom tool could print itSimplistic methods do not show what is going on
Complexity ImpactDynamic reconfiguration must be enabledOne reconfiguration each morning and each evening
Applicability (Assertions, out of scope for this project!)CPU based example can be applied to blades, RAM, disk, net, thermalMethod can be extended from platform level to services
Project: Capacity Planning for N1ID: P925
35
N1 Console Screenshots
VERIFY
Project: Capacity Planning for N1ID: P925
36
Capacity for SaleUses for Spare Capacity
Carefully schedule batch work and backupsRemotely support global timezonesRun engineering dept. simulation jobs
Grid Oriented SolutionsProject Grid - departmental cluster (Sun Grid Engine)Enterprise Grid - collection of clusters forming a general
purpose Grid service (Sun Grid Engine Enterprise Edition)The Global Grid - Internet level - GT2.2, OGSA/OGSI/GT3Provision an Enterprise Grid service using N1Join The Global Grid and sell or share capacity
GRID
Project: Capacity Planning for N1ID: P925
37
Relationships: N1 and GridN1 is about provisioning things you own, Grid is about access to things you don’t own
UtilityComputing
Grid ServicesWeb Services
Things youborrow or
lease
UtilityComputingN1
Things youown andcontrol
BusinessModelInfrastructure
GRID
Project: Capacity Planning for N1ID: P925
38
Tier 0WebFrontEnd
Tier 3DatabaseStorage
Tier 2App
Servers
Tier 1Web
Servers
SunGrid
EngineEnter-Prise
Edition
Cluster GridCompute and
Storage Resources
N1 Virtualized Datacenter
FreePool
UnusedResources
Repair and Replace
CapacityOn
Demand
RetireObsoleteCapacity
PurchaseCapacity
PurchaseC.O.D.
Web User /Web Services
Grid User /Grid Services
UtilityComputingCapacityRequests
Capacity Flows in a Grid Enabled N1 Datacenter
GRID
Project: Capacity Planning for N1ID: P925
39
IT market segments by “need to share”
Everythingincluding other
users
Everything in TheGlobal Gridcommunity
Local systemsand Internet
Local systems,Project Grids
What isvisible
Networkbandwidth.Know-how
CPU cycles.Organizational
issues
Storage.Organizational,
legal,contractual
issues
CPU cycles,Latency.Nationalsecurity
Primaryconstraints
P2P apps,SETI, Kazaa,
Limewire,People!
Grid, VPN,encryption,
firewalls
N1, Serverdomains, VLANand SAN Zone
partitioning
Nothing,physical
separationrequired
What istrusted
EverythingOperatingSystemHardwareNothingWhat can be
shared
Consumerusers
Technicalgeeks
Commercialsuits
Defensespooks
GRID