tivoli autonomic computing © 2003 ibm corporation a research agenda for business-driven it jeff...

20
Tivoli Autonomic Computing © 2003 IBM Corporation A Research Agenda for Business-Driven IT Jeff Kephart (IBM Research) Steve White (IBM Research) Edie Stern (IBM Tivoli)

Upload: britney-joseph

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Tivoli Autonomic Computing

© 2003 IBM Corporation

A Research Agenda forBusiness-Driven IT

Jeff Kephart (IBM Research)Steve White (IBM Research)Edie Stern (IBM Tivoli)

Tivoli Autonomic Computing

© 2003 IBM Corporation2

Business-Driven IT

We want a world in which businesses can respond flexibly to opportunities and threats

Flexible business requires flexible IT …

Tivoli Autonomic Computing

© 2003 IBM Corporation3

Is this flexible IT?

No – that’s what autonomic computing is supposed to fix!

Tivoli Autonomic Computing

© 2003 IBM Corporation4

The role of autonomic computing

Autonomic computing systems are:

– “Computing systems that manage themselves in accordance with high-level objectives from humans.”

• A Vision of Autonomic Computing, IEEE Computer, J. Kephart and D. Chess, Jan. 2003.

How high is “high”?

Business-driven IT: the high-level objectives are business objectives

Tivoli Autonomic Computing

© 2003 IBM Corporation5

Towards Business-Driven IT

Business Objectives (e.g. KPIs)Business

Process Models

IT system

Monitored business data

Business to ITTools & Transforms

Automated Provisioning and

Deployment

Automatically provision, deploy DCM

Human Expertise

Policies

Business Process Tools & Transforms

Platform-Independent Models

Business Objectives and Metrics

DB policies

Network policies

IT Admins

Storage policies

Business Architects

Deployers and Domain Experts

High-level IT metrics, objectives

Human specification of low-level, platform-specific policies gives way to high-level discipline-specific objectives with tradeoffs

Convert objectives

Translation of models, metrics and objectives from business terms to IT terms will become increasingly automated

There will be continuous feedback between IT and business levels to calibrate business-to-IT transformations

Self-managing

Tivoli Autonomic Computing

© 2003 IBM Corporation6

Application Experts

Databaseexperts

and tools

Serverexperts

Networkexperts

and tools

Mainframe experts

Availability Management

Security Management

Change Management

Service Level Management

Information Lifecycle Mgmt.

From siloed policies to high-level IT objectives

Replace resource-oriented silos with horizontal process-oriented solutions

Workstationexperts

Server Policy

Network Policy

DBPolicy

Availability Objectives

Security Objectives

Performance Objectives

Change Objectives

Replace resource-oriented policies with objectives defined by management discipline; aggregate them

High-level IT metrics, objectives

Tivoli Autonomic Computing

© 2003 IBM Corporation7

Scenario: Managing to Performance and Availability Objectives

cpu, b

On Demand Env 1

Util(RT, DT)

Service-level utility

Application Manager

Resource-level utility

Models

NetUtil(cpu, b; )

Optimizer

Perf. Model

RT(cpu, b; )

Avail. ModelDT(b)

Cost ModelCost(cpu)

High-level IT metrics, objectives

Policies

eBrokerage transactions application

1 sec response time for Gold customers is OK. I don’t need faster than 0.75 sec; more than 2.0 sec is unacceptable.

50 min downtime/month is tolerable; 100 min is bad.

Good DownTime is slightly more important than good RT.

Objectives are defined as utility function for Response Time and Down Time, which captures tradeoffs

The system can now set these parameters to their optimal values, or advise a human administrator.

Composing utility with models yields an optimization problem in terms of low-level params that can be posed to an appropriate optimizer.

Models capture human expert knowledge about dependence of high-level IT metrics on lower level system knobs and observables like demand . They can be refined automatically.

IBM Research

© 2005 IBM CorporationResearch Challenges for AI and Autonomic Computing University of Alberta, November 21, 2005

Performance-Availability Tradeoffs using Utility Functionswith J. Strunk, B. Salmon, G. Ganger, CMU

Cost Function for Trace Processing Application

Availability Bandwidth (MB/sec)

Cost ($/yr/student)

$5000$10000$15000$20000

$25000$30000

Student waits for run on 27GB trace file once per day; costs $30/hr

Outage renders student 50% effective + sys admin spends 100% time fixing; costs $45/hr

Tivoli Autonomic Computing

© 2003 IBM Corporation9

Need interfaces and algorithms to support elicitation of high-level objectives

WebSphere XD uses templates to elicit average or percentile response-time objectives

E-commerce preference elicitation methods that help consumers to express complex tradeoffs will be adapted to systems administration

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

WAS XD Utility Function Combination

),min( SG UU

Utility-based Collaboration among Autonomous Agents

© 2006 IBM CorporationAAMAS-06: Hakodate , Japan

WebSphereXD-TIO Data Center

WebSphereXD WebSphereXDFreePool

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

Control parameters

Suppose we have just two control parameters cpu = # processors

b = data backup interval (in minutes)

We want to choose (cpu, b) to optimize Util(rt,rpo) – Cost(rt,rpo)

We need to transform the utility function into control parameter space

We can do this using models that relate (cpu,b) to (rt,rpo)

NetUtil(rt, rpo) =

Util(rt,rpo) – Cost(rt,rpo) =

Util( rt(cpu, b; ), rpo(cpu, b ; )) – Cost(cpu, b) =

NetUtil(cpu,b; )

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

Models

b

(sec-1)(b)

cpu

RT(msec)

= 10-4

= 10-3

= 5*10-3

rt(cpu, ; )

=10

0

1

2

3

4

5

6

0 1 2 3 4 5 6

b

rpo

rpo(b)

0

10

20

30

40

50

60

0 1 2 3 4 5 6

cpu

Co

stCost(cpu)

NetUtil(rt, rpo) = Util(rt,rpo) – Cost(rt,rpo) =

Util( rt(cpu, (b); ), rpo(b)) – Cost(cpu) = NetUtil(cpu,b; )Models could come from:Analytics/Queuing, Simulation, Machine Learning

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

Net utility as function of control parameters

NetUtil(cpu, b; )Util(rt, rpo)

cpu

b

Unet

rpo

rt

U

rt

rpo cpu

b

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

Net utility vs. control parameters

NetUtil(cpu, b; )

Util(rt, rpo)

rt

rpo

b

cpu

b*=0.874575cpu*=2.49134U*=152.661rt*=99.5775

b

cpu

b*=1.19931cpu*=3.65144U*=137.414rt*=95.4449

b

cpu

b*=2.05265cpu*=8.58375U*=75.8644rt*=88.6853

ACT-I project: Algorithms and Protocols for Impact Analysis

© 2005 IBM Corporation

Challenges at IT level

Elicit high-level IT objectives Manage to them Interactive effectively with administrators to

build trust

Tivoli Autonomic Computing

© 2003 IBM Corporation17

Towards Business-Driven IT

Business Objectives (e.g. KPIs)Business

Process Models

IT system

Monitored business data

Business to ITTools & Transforms

Automated Provisioning and

Deployment

Automatically provision, deploy DCM

Human Expertise

Policies

Business Process Tools & Transforms

Platform-Independent Models

Business Objectives and Metrics

High-level IT metrics, objectives

Convert objectives

Self-managing

© Motorola and IBM, 2005-2006Autonomic Systems and Networks – Theory and Practice Page 18

End-To-End Model-Based & Goal-Driven Deployment

“Rainforest” Deployment

DesignTool

Logical Application Structure

Developer

EAR

DBJSPTivoli

ProvisioningManager

Deployment Topology

EAR

DBJSP

Deployer

WEBServer

APPServer

DBServer

Logical Topology Model

RationalSoftware Architect

•Physical•Complete•Correct•Actionable

Automatically Combine Fine-Grained Best Practices Patterns To Transform the Logical Application Structure to a Physical Topology

Model Transformations(Best practices)

LdNode2

LdNode1

Firewall

VLAN1

VLAN2

LdNode1

LdNode2

Connection

Inse

rt F

irew

all

DomainExpert

LdNode2

LdNode1

Firewall

VLAN1

VLAN2

LdNode1

LdNode2

Connection

Inse

rt F

irew

all

LdNode2

LdNode1

Firewall

VLAN1

VLAN2

LdNode1

LdNode2

Connection

Inse

rt F

irew

all

Eilam et al. (IBM TJ Watson)

© Motorola and IBM, 2005-2006Autonomic Systems and Networks – Theory and Practice Page 19

Automated derivation of thresholds and goals from SLOs

Managed System Layer

AutomatedThreshold

Setting

Application Layer

Component

ComponentComponent

ComponentComponent

Componnet monitoring

Application Application Application

Component alerts

ManagementLayer

SLO violations

SLOs

Threshold setting

ApplicationMonitoring

Payroll BusinessInt App…

App Server DB2Server StorageServer

Disk Controller Disk Controller

Use statistical techniques to correlate SLO violations at Application Layer with monitored data in System Layer

Automatically set alert thresholds to desired false-positive / false-negative tradeoff

Originally presented at ICAC ‘05

Breitgand, Henis, Shehory (IBM Haifa)

Tivoli Autonomic Computing

© 2003 IBM Corporation20

Implications

Human specification of low-level platform- and resource-specific parameters and policies will be phased out.

Administrators will specify power, performance, availability and security objectives, and acceptable tradeoffs between them. Algorithms and interfaces for eliciting high-level IT objectives will emerge, as will standards for expressing them.

Models will capture human expert knowledge of how high level objectives relate to lower-level system parameters, and they will be refined automatically via feedback.

Resources will employ models in conjunction with optimization and planning technologies to manage to multiple objectives, both for deployment and runtime operations.

The entire stack of business-driven IT will be completed, as business objectives get transformed to high-level IT objectives that drive deployment and runtime operations. Standards for expressing business-level objectives will emerge.