grid & performability aad van moorsel aadvanmoorsel.com

32
grid & performabilit y Aad van Moorsel aadvanmoorsel.com

Upload: ian-bailey

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid & performability Aad van Moorsel aadvanmoorsel.com

grid &performability

Aad van Moorselaadvanmoorsel.com

Page 2: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 2April 2003 Copyright Aad van Moorsel, HP Labs

outline

to set the stage:• what is grid?• what is performability?

three perspectives on grid performability:• `customer’ requirements• system implementation

– utility computing• associated research challenges

– focus on stochastic modeling

Page 3: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 3April 2003 Copyright Aad van Moorsel, HP Labs

what is grid?

what is performability?

Page 4: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 4April 2003 Copyright Aad van Moorsel, HP Labs

grid

for me, and in this talk:• middleware layer, Globus-like• shares resources• crosses boundaries

– administrative domains, user domains, enterprise domains, …

• software-implemented boundaries– flexibility in who uses what when– flexibility in what is secured against whom when– flexibility in who charges for what when– …

• makes resources manageable– grades of QoS– dynamic management of QoS– service level agreements, business metrics and

penalties

Page 5: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 5April 2003 Copyright Aad van Moorsel, HP Labs

performability

for me, and in this talk:

• quality of service (QoS)

context:• Meyer: metric P(T<t) where T was some random variable• my thesis: meaningful quantitative evaluation of a system

(definition 2 out of 3)• others: performance and reliability• SPN models for system state, rewards or queuing networks for

performance/metric

Page 6: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 6April 2003 Copyright Aad van Moorsel, HP Labs

grid & performability

we accept the claim that grid is software that will facilitate flexible performability management

• the software design still leaves to be desired– automation? autonomous? autonomic?– scaling? inter-business? security?

• but the applications will drive it in the right direction– utility computing– service-centric outsourcing

Page 7: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 7April 2003 Copyright Aad van Moorsel, HP Labs

grid & performability

`customer’ perspective

Page 8: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 8April 2003 Copyright Aad van Moorsel, HP Labs

business costs of owning and operating IT have gone through the roof

Page 9: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 9April 2003 Copyright Aad van Moorsel, HP Labs

business cost of IT failures

downtime costs per hour

brokerage operations $6,450,000credit card authorization $2,600,000e-bay (1 outage 22 hours)$225,000amazon.com $180,000package shipping services$150,000home shopping channel $113,000catalog sales center $90,000airline reservation center $89,000cellular service activation $41,000on-line network fees $25,000ATM service fees $14,000

source: Dave Patterson keynote at FAST ‘02

survey of computer damages in France, 2000

Page 10: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 10

April 2003 Copyright Aad van Moorsel, HP Labs

courtesy of Lisa Spainhower, IBM

operational complexity: scale

Page 11: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 11

April 2003 Copyright Aad van Moorsel, HP Labs

operator faces heterogeneity

Content Logic Processes

Business Place content closer to where it is needed

Reengineer business process Select services for each activity in the process dynamically

Databases App servers Web servers

Software

Share a database vs. create a new database Re-index tables to optimize queries

Number of app servers needed Start and stop new app servers

Number of web servers needed Load balance transactions across servers

Servers Network Storage

Hardware

Allocate machines to applications Replace a failed machine transparently by migrating its applications

Reserve network bandwidth prior to use QoS-based routing decisions

Assign storage devices to workloads Configure buffer sizes in device drivers to maximize performance

CDN

BPR

dynamiccomposition

databaseUtility

ZLE, DBMS

App serverUtility

Web serverUtility

loadbalancing

UDC/QM/SF

VMs

Storagemanagement

RSVP

Page 12: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 12

April 2003 Copyright Aad van Moorsel, HP Labs

operation faces federation needs

Content Logic Processes

Business Place content closer to where it is needed

Reengineer business process Select services for each activity in the process dynamically

Databases App servers Web servers

Software

Share a database vs. create a new database Re-index tables to optimize queries

Number of app servers needed Start and stop new app servers

Number of web servers needed Load balance transactions across servers

Servers Network Storage

Hardware

Allocate machines to applications Replace a failed machine transparently by migrating its applications

Reserve network bandwidth prior to use QoS-based routing decisions

Assign storage devices to workloads Configure buffer sizes in device drivers to maximize performance

Page 13: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 13

April 2003 Copyright Aad van Moorsel, HP Labs

customer needs

business-driven, automated operator toolsfor systems with increasing

scale, heterogeneity and federation challenges

Page 14: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 14

April 2003 Copyright Aad van Moorsel, HP Labs

grid & performability

system perspective (utility computing)

Page 15: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 15

April 2003 Copyright Aad van Moorsel, HP Labs

twin UDCs in HP Labs

• built the first large utility data center in Palo Alto (US) and Bristol (UK)

– learn what it takes to build a solution

– move HPL IT services to the UDC• the first virtualized data center

– from Server, storage, networks to energy management

– dynamically assigns applications to resources

– customer sees resources as ‘utility’

– operator sees resources as ‘utility’

Page 16: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 16

April 2003 Copyright Aad van Moorsel, HP Labs

utility computing from usage perspective

UDC1

UDC2

Server Cluster

??

reserving resourcesgetting resourcesflexing resources

Page 17: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 17

April 2003 Copyright Aad van Moorsel, HP Labs

utility computing from operator perspective

UDC/XMLInterface

Utility Data Center =programmable poolof data center resources

UDC GRAM =GlobusGatekeeper +UDC Adapter

UDCGRAM

UDCGRAM

Grid interface

(prototype developed at HP Labs, initially gtk2, currently migrated to

gtk3)

Page 18: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 18

April 2003 Copyright Aad van Moorsel, HP Labs

title

configureproperties

Page 19: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 19

April 2003 Copyright Aad van Moorsel, HP Labs

title

generateRSL

Page 20: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 20

April 2003 Copyright Aad van Moorsel, HP Labs

utility computing for operators

utility computing has great potential to improve operations:

• better utilization of resources• better tools for setting up applications• new business models, better accountability

but UDC is just one, high-end solution

need something that is open, extensible, uniform, …

grid based management backplane

Page 21: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 21

April 2003 Copyright Aad van Moorsel, HP Labs

utility computing grid middleware

everything is a Grid

service

leverage Grid

HP value-add

management

OpenView orchestrate

s IT

OpenView command and control

SLA

base Grid:uniform interface, single sign-on, federation, stateful services

management backplane: monitoring, rich discovery, life-cycle, coordinated ‘act’, policy,biz-impact driven adaptation, flexible secure mgmt domains

Page 22: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 22

April 2003 Copyright Aad van Moorsel, HP Labs

more automation: flexing resources

objective: increase asset utilization via resource sharing while providing a desired quality of service for applications

approach: a statistical multiplexing technique for resource utilities that host business applications

characteristics of business applications:• require resources continuously• changes in number of users and workload mix may result in:

– time varying demands

– large peak to mean ratios for demand

– future demands that are difficult to predict precisely

• customers want assurances they will get resources when needed

– for example, resource request will be satisfied with a prob. p=0.999

– i.e. 999 times out of 1000

– customers don’t always need an assurance of p=1.0

Page 23: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 23

April 2003 Copyright Aad van Moorsel, HP Labs

statistical demand profiles

to guide the development of our techniques we rely on gathered data:– 48 servers in an HP data center– hosting business applications– each with 2 to 8 CPUs

create a statistical demand profile for each application– compact representation of pattern for demand– characterize “day of week” and “day of weekend” separately

• ignore weekends for the purpose of the study– characterize a “weekday” by 24 60-minute time slots

• probability mass function (pmf) gives the observed distribution for the number of CPUs needed per slot

the profiles populate a calendar of “expected demand” for the utility– enables admission control

Page 24: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 24

April 2003 Copyright Aad van Moorsel, HP Labs

admission control approach

• a new application requests admission to the utility

• assume we admit the new application• unfold its profile onto the utility’s calendar for a

capacity planning horizon – for example, several months into the future

• characterize the calendar’s new per-slot distributions of aggregate demand

• use distributions to estimate required size of resource pool

• admit application if there are sufficient resources

Page 25: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 25

April 2003 Copyright Aad van Moorsel, HP Labs

demands for a time slot t

applications

utility:- distribution of aggregate demand is approximated by the joint pmf- however, we must also consider correlations between application demands

Page 26: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 26

April 2003 Copyright Aad van Moorsel, HP Labs

experimental design and results

• how many CPUs are needed if applications:– are statically assigned their peak numbers of CPUs?– are assigned the peak number of CPUs needed on per-slot basis?– are offered assurance p that resource requests will be satisfied?

• about the experiments:– include application demand correlations as measured– include 60 minute warm-up/warm-down application migration

overheads– reported estimates verified using trace driven simulation

resource access mechanism

number of CPUs required

static 309peak per slot (p=1.0) 275statistical multiplexing p=0.999

179 (estimate)

statistical multiplexing p=0.99

163 (estimate)

Page 27: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 27

April 2003 Copyright Aad van Moorsel, HP Labs

grid & performability

modeling research perspective

Page 28: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 28

April 2003 Copyright Aad van Moorsel, HP Labs

modeling issue I the many perspectives of virtualization

virtualization enables flexibility in UDC:1. storage area networks let applications use any

storage device 2. computing virtualization allows to assign CPUs

dynamically to customers3. virtual LAN creates a secure private network

virtualization gives the illusion of some traditional functionality (‘boundaries’), but implements it ‘soft’

modeling challenges: different views for different users, dynamic changing of boundaries (performability!), how to utilize the models contained by the software

Page 29: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 29

April 2003 Copyright Aad van Moorsel, HP Labs

modeling issue IIon-line algorithms

on-line algorithms are key to conquer complexity:• automated adaptation needs on-line algorithms

on-line algorithms come in many shapes and forms:• days: resource scheduling• seconds: load balancing, admission control, retries• milliseconds: memory optimization, real-time scheduling

typical issues:• speed of the model solution• chose between statistical and structural models• obtaining the right on-line data• plug-in algorithm module need data model that fits with

operational model

Page 30: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 30

April 2003 Copyright Aad van Moorsel, HP Labs

modeling issue IIIhow to validate large scale systems

many facets to scale:• more and more devices• more and more interconnected (even globally)• increasing number of users• multi-party and multi-ownership• greater differences in scale: smaller devices,

bigger data centers• amount of data collected and analysis done

increases with the scale of the systems

we have no good ways of analyzing large-scale systems: no test beds, no reliable data, no widely accepted modeling approaches

Page 31: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 31

April 2003 Copyright Aad van Moorsel, HP Labs

modeling issue IVhow to evaluate for business metrics

the real metric of interest is euros:• how much is the total cost of ownership• how much am I as customer willing to pay for a

service• what penalties do I as provider accept in an SLA• if I invest x, what is the return on IT investment

how do we model the money/QoS correlation?

Page 32: Grid & performability Aad van Moorsel aadvanmoorsel.com

page 32

April 2003 Copyright Aad van Moorsel, HP Labs

conclusion

• adaptive/utility/autonomic computing has intrinsic need for QoS (performability) modeling and analysis

• the grid is believed to be the platform of choice– applications are more interesting than the

middleware

• challenges for stochastic modeling larger than ever in this setting:– virtualization– on-line algorithms– large-scale systems– business metrics