conclusions from the european roadmap on control of computing systems karl-erik Årzén, anders...

52
Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson , Dan Henriksson LTH, Lund University, Sweden Mikael Johansson, Håkan Hjalmarsson, Karl Henrik Johansson Royal Institute of Technology , Sweden FeBiD’06, Vancouver, April 3, 2006

Upload: marion-wilkinson

Post on 26-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Conclusions from the European Roadmap on

Control of Computing Systems

Karl-Erik Årzén, Anders Robertsson, Dan Henriksson

LTH, Lund University, Sweden

Mikael Johansson, Håkan Hjalmarsson, Karl Henrik Johansson

Royal Institute of Technology , Sweden

FeBiD’06, Vancouver, April 3, 2006

Page 2: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Background:

Recent large research interest,

(academically as well as industrially initiated) in

Control-based methods for resource management in real-time computing and communication systems

In most cases, allocation of memory, computing and/or communication resources

Page 3: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Examples

• Performance control of web-servers, • Dynamic resource management in embedded

systems, • Traffic control in communication networks, • Transaction management in database servers, • Autonomic computing

etc.

Page 4: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

eBusiness

Many types of servers and applications

TPF

transaction processing

facility

mainframe

Front end for online customer service

SUN

E-mail

Server A

E-mailAddress Capture

AIX

DSS

Server B

Decision support

gateways

SUN

Sybase

Server A

LocalDirector

Network

Hub Servers

Application Server

Web Server

Server A

messaging

HTTP

Presentation Business Logic Gateway

Relational database Servers

mainframeSYSPLEX

transaction monitor

Back-end Systems

DSSClient

JDBC

HTTP

Server B

Web Server

messaging

Logging

MQ

Server B

GatewayLogging

messaging

Server AApplication

Logging

messaging

Logging Servers

TPF

SYSPLEXmainframe

hierarchical database

transaction monitorSNA

Server B

SNA

SNAmessaging

SUN

Sybase Security

Server B

Security Servers

Multi-tier systems of Web browsers, business logic and databases

Feedback at various levels

Queue Control

IBM, HP, Microsoft, Amazon, ….

Challenges:• Modeling

formalisms (DES, ODEs, queuing theory, …)

• Design of software and computing systems for controllability

[courtesy J. Hellerstein]

Page 5: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

ARTIST2

• Roadmap outcome from ARTIST2-workshop in Lund, Sweden, May 2005

• EU/IST FP6 Network of Excellence – Embedded Systems Design

• NSF-supported workshop on

”Future trends in control of computer systems”

by Hellerstein, Tilbury & Abdelzaher, May 2005

www.artist-embedded.org/FP6

Page 6: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael
Page 7: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Roadmap

Available for download at

http://www.control.lth.se/user/karlerik/roadmap1.pdf

Experiment:

You have wireless network access – try the server!

…or not.

Page 8: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

An admission (control) problem

Page 9: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Report from Swed. Emergency Management Agency

Page 10: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

How to handle the overload problem?

• Overprovision – (more capacity than needed on average)

• Admission control– Some are denied access, but server

continues to operate.

• Change service – (”sending text-only at high loads”)

Page 11: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Why is control of computing systems interesting?

• Multidisciplinary:

– Several ”new” challanges– Not covered within one traditional ”research domain” (queueing theory,

computer science, systems and control…)

• Need systematic tools for design and analysis– robustness to ”disturbances”– better performance

• Cost of operating computing systems is raising/dominating (60-90%) [Hellerstein et al, 2005]

Page 12: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Outline• Background & Motivation

• Computer systems in a control theoretic framework– Modeling issues

• Roadmap: Research challenges in

– Control of server systems, – Control of CPU resources,– Feedback scheduling of control systems,– Control of communication networks, – Error control of software systems,– Control middleware.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

[4.15 pm] Panel – Top Three Challenges in Control of Networks and Systems

Page 13: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Contents of roadmap

Six research areas:

– Control of server systems, – Control of CPU resources– Feedback scheduling of control systems,– Control of communication networks, – Error control of software systems,– Control middleware.

”…how flexibility, adaptivity, performance and robustness can be achieved in a real-time computing or communication system through the use of control theory”

Page 14: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Modeling Formalisms

• Heuristic approach vs. Model based control– ”Inherent robustness” from feedback control – One reason why many ad hoc stratergies work – More can be gained (systematic design & analysis)

Basic principle: Use simple enough models for design and analysis– Model should capture essential dynamics and show

similar behavior as system for different distributions and load cases.

Page 15: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Modeling Formalisms

• Identification

• Sampling (SoH), ”noise”, inherent nonlinearities – ”First principles” (conservation, queueing theory)

– Computing systems: discrete-event dynamic systems (DEDS)

+ real-time systems => timed automata or timed Petri nets• Risk of state-space explosion (does not scale with arrival/service rates)• Well-suited for safetey and blocking properties, but how does it relate to

stability and robustness?

Page 16: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Modeling of queueing-systems

• Discrete event models• Queue theoretic model (Markov chains etc.)• Flow models (cont. time / average models)• Discrete time models

Page 17: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Modeling aspects

”Gain-scheduling”: (standard control principle):

– ”Choose among different control-parameters depending on e.g., operating condition”.

• ”Good” model structure of corresponding computing system may change with work load (e.g., for server systems)– Flow models OK for high loads – DEDS-models feasible för low loads

• Interpolation between different model structures?!• Transient vs steady-state behavior

Page 18: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Actuator Mechanisms

• The difference between the service rate, µ, and the arrival rate, λ, determines the delay experienced by the requests.

• Enqueue actuators: (Changing the arrival rate) – Admission control mechanism– Change inter-arrival period of task ”upstream” in multitiered system

• Dequeue actuator: Changing the service rate:– Number of server threads– Quality adaptation– Dynamic voltage scaling

Page 19: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Actuators - Implementation aspects

• Gate model:– Call gapping — accept first u(kh) calls in

control interval– Percent blocking — preserves distribution

Crusial not to trigger network resend (lost package), -”cmp Heracles and the hydra” [Sha ?]

Page 20: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Related reseach areas

Similarities/differences

of the different domains

– Traffic flow control– Manufacturing and supply chains– Communication networks– Power networks

with respect to

– Where does the congestion appear?– Routing?– Available information (dest.)?– Time/distance matters?– ”Package dropping” – OK or not?– Control action? Animation

Example: Highway congestion in LA [Varaiya et. al.]

Page 21: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of server systems

• Temporal control locally at server– Direct or ”indirect” objective

(service provider vs. customer)

• Queue-management and load balancing

• Inherent nonlinearities

• Multi-tiered systems including large eCommerce systems

Page 22: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Example: Admission control

Objective:– Good transient behavior for traffic changes– Preserve good performance for overload situations

Measure of admission– queue length– average time– utilization– CPU load / energy consumption– memory– …

Page 23: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Example: Feedforward + feedback

Page 24: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of server systems

• Prediction and state estimation based control• State and actuator constraints• Interestings region: When do the flow-models cease to

be valid?• Changing models and criteria in different load

situations...• Very exciting new results on discrete-event based

estimation and control– DE-sampling vs. DT-sampling

• control: ratio 1/5, • bandwidth allocation: 1/2

Page 25: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Server systems - Research challenges

• Modeling issues (as discussed before)• Control + queueing theory = ?• Event-based control – theory gap• Control objectives

– References (load, utilization) – Performance metrics and cost functions

• (upcrossing probabilities)• Security, reliability, availability, efficiency…

• Design patterns/Control patterns– Software structure + control structure and analysis for software

design• Well known in e.g., process control (ratio control, cascade,

midranging etc)– When should a queue problem be considered as

• an admission problem?• an delay control problem?

• Large-scale distributed systems / multi-tier systems– Distributed control, MPC, …

Page 26: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of CPU resources

• A large amount of feedback-based or adaptive global QoS management systems have been proposed.

• Early ad hoc schemes

of multi-level feedback

queue scheduling

control-theoretical approaches

using FC-EDF, EUCON

[Stancovic, Lu, Buttazzo,…]

The EDF-FC scheme (from [Stankovic et al., 1999])

Page 27: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of CPU resources –The challenges and research directions

• Multiprocessor systems• Power-aware CPU scheduling

– Dynamic Voltage Scaling– joint optimization problem of minimizing energy while still meeting real-

time constraints• already today receives a considerable attention from the research

community.• End-to-end resource management:

– Resource management in distributed systems where an activity spans multiple nodes

• Hierarchical resource allocation schemes– Cascaded structures with local allocation

• Efficient feedback scheduling mechanisms– Scheduling algorithm overhead – online optimization doable?

Page 28: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Feedback scheduling of control tasks

Actuation– Task period hi

Solve two different problems:• Resource regulation

– Control the total utilization to avoid overloads

• Optimal resource distribution– Assign individual task periods to optimize performance

Page 29: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Example: Dynamic Real-Time Scheduling of Model Predictive Controllers

• Based on on-line optimization of a cost function– Convex optimization problem solved in each sample– Iterative anytime algorithm– Result gradually refined up to a certain bound

• Attractive control strategy– Straightforward to use for multi-variable processes– Ability to handle constraints

• Unattractive real-time properties– High computational demands– Very large variations in execution times

Henriksson et al. 2004

Page 30: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Example: Feedback scheduling of MPC control tasks

Main idea

A process in stationarity may need less resources than a process in a transient phase

Use feedback from the optimization algorithm to determine

• for each MPC task, when to terminate the optimization and output the control signal, and– the optimization may be terminated early and still

produce acceptable results.

• which of several ready MPC tasks that should be scheduled for execution.

[Henriksson et. al., 2004]

Page 31: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

• Current values of the cost functions act as dynamic task priorities– Constitutes an on-line QoS measure for the task– Reflects the relative importance of the tasks

• Feedback scheduler distributes the computing resources– Schedules MPC task with highest cost– Invoked after each iteration– Implemented as a separate task

Page 32: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

• Cooperative robot task under resource constraints

• Master and slave configuration

• Ball and beam application

Page 33: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

• Problems:– MPC tasks exhibit very large variations in

execution time– Traditional scheduling theory not applicable

• Solutions:– Premature termination of optimization– Dynamic scheduling based on cost functions

Page 34: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

The challenges and research directions for feedback scheduling of control tasks

include all the challenges and research direction of control of CPU resources.

Additionally, the following items are important:

• Temporal robustness indices• Formal performance guarantees

– open question whether it is possible to combine the flexibility implied by feedback scheduling with formal guarantees

Page 35: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of Communication Networks

Example:• Feedback control is embedded in the TCP protocol in the

form of a sliding window mechanism.• Introduced in the 80’s to solve the congestive failure

problems that had brought down the network.• We have not experienced system-wide congestive

failures again even though the network has grown orders of magnitude.

• This is a testament of the effectiveness of feedback control in a highly dynamic, decentralized, and fast changing environment.

Remark:[9.00] Robust yet Fragile: Intrinsic Tradeoffs in Layered

Architectures

Page 36: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control of Communication Networks

• Feedback control mechanisms are fundamental for the separation of communication layers

• Gives robustness and allows local optimization and refinements

Example• Reliable data transfer over wireless link through suitable

feedback control of– transmission power– modulation scheme– channel coding

Page 37: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Research Challenges in Control of Communication Networks

• Architectures and model abstractions for network control– Network models suitable for control and observer design– Robustness of large scale and distributed systems– Resource management in wireless networks

– Cross-layer adaptation for new services and optimized

performance

Page 38: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Cross-layer adaptation for improved

performance of cellular and wired networks

• Bandwidth variations in radio link give performance degradations due to large end-to-end delay and improper transport protocol

• Proxy between cellular and wired networks adapt sending rate to bandwidth variations through available radio link state information

3G-GGSN

3G CellularNetwork

Terminal

BTS

3G-SGSNRNC

BTS

App Server

InternetBW variations

PROXY

TCP

TCP

Page 39: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Proxy hybrid control law

• Controller in proxy regulates sending rate based on– Events generated by bandwidth changes obtained from RNC– Sampled measurements of queue length in RNC

[Möller et al., 2005]

Page 40: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Experimental evaluation

• Improved time-to-serve-user and link utilization compared to traditional end-to-end protocol

[Möller et al., 2005]

End-to-end protocol

New protocolBandwidth utilization

• Stability and robustness analysis of new protocol • Ongoing experimental evaluation and testing with

Page 41: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Network-aware control architecture

• Estimate network state– Delay– Data loss probability– Bandwidth

• Adjust controller accordinglyWireless network

Networkobserver

Plantobserver

Controllaw

Page 42: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Network-aware controllers

Control algorithms to cope with communication imperfections– Control under network delay– Control under data loss– Control under bandwidth limitation– Control under topology constraints

Characteristics depend on network technology Wireless network

Networkobserver

Plantobserver

Controllaw

Page 43: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Delay estimation

• Internet round-trip time (RTT) data are noisy with piecewise constant average

• Complex network dynamics hard to model• RTT estimation in TCP:

• Improved estimation thru Kalman filter with hypothesis test (CUSUM filter)

[Jacobsson et al., 2004]

Page 44: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control middlewareMiddleware:

• a software abstraction layer that mediates the interactions between a component or application

• Commonly used in distributed system to provide communication services. – Java-RMI, Microsoft’s .COM, and CORBA…

• Networked embedded system applications,• e.g., mobile systems and sensor systems.

– GAIA [Romn et al., 2002], WSAMI [Issarny et al., 2005], and AURA

Page 45: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Control middleware

Research Directions• The most important research item for control middleware is to

develop these systems from research prototypes to something that may be used more widely.

• Middleware functionality:

Still an open question whether the middleware should – be passive, i.e., provide sensing and actuation services that the

application can use to itself implement the feedback control, or if it should be

– active, i.e., the middleware should be responsible for the actual control loop.

Both of these approaches have advantages and disadvantages.

Page 46: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Error control of software systems [L.Sha]

• The idea behind error control of software is to use ideas similar to the ideas used in feedback control in order to detect malfunctioning software components and, in that case fall back on, a well-tested core software component that is able to provide the basic application service with guarantees on performance and safety.

• Provide techniques and tools that support making the semantic assumptions of each software component explicit and machine checkable.

Page 47: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

• Simple and reliable core• System remain in recoverable states

• SIMPLEX-architecture [Sha]– High accurance vs high performance– Need to stay in recoverable state

• Runs in parallell --- cmp ”bumpless transfer”

----------------------------------------------------------------------------

• ORTGA [FeBID’06]– Maximum stability region

– How to detect conditions for switches? (FDI)• False alarm vs. Non-recovery risk of instability

Page 48: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Roadmap

Available for download at

http://www.control.lth.se/user/karlerik/roadmap1.pdf

Page 49: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael
Page 50: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Conclusions

• Thank you for your attention!

• Questions?

• Panel debate

Page 51: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael
Page 52: Conclusions from the European Roadmap on Control of Computing Systems Karl-Erik Årzén, Anders Robertsson, Dan Henriksson LTH, Lund University, Sweden Mikael

Proposed solutions for wireless TCP

• Split connection– Destroys end-to-end semantics

• End-to-end protocols– Deployment issues

• Link-layer improvements– Performance limitations

E.g., Balakrishnan et al., Ludwig and Katz, Xylomenos et al., Huang et al., Hossain et al., RFC 3135 and 3366, …