aleae : handling uncertainties in large-scale distributed systems emmanuel jeannot loria - inria -...

19
ALEAE : Handling Uncertainties in Large-Scale Distributed Systems Emmanuel Jeannot LORIA - INRIA - CNRS ALEAE Kick-off April 1st 2009

Upload: houston-dewhirst

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

ALEAE : Handling Uncertainties in Large-Scale Distributed Systems

Emmanuel Jeannot

LORIA - INRIA - CNRS

ALEAE Kick-off

April 1st 2009

Managing uncertainies E. Jeannot 2/16

Introduction

What is a grid?

An infrastructure : Distributed Heterogeneous

But also Dynamic Shared

Lot of uncertainties

Managing uncertainies E. Jeannot 3/16

Uncertainties

Uncertainties: • unpredictable behavior• Behavior not as expected

Where does it come from? Infrastructure (hardware) Application (software) Users

Managing uncertainies E. Jeannot 4/16

Uncertainty at the infrastructure level

The hardware that compose a grid can:• Fail• Be volatile (be removed or added)• Have performance degradation (due to a shared usage)

Managing uncertainies E. Jeannot 5/16

Uncertainty at the application level

It is often assumed that: • One know the duration of the composing part of an application• Its resource usage is known• It does not fail.

However, this is not always the case

Managing uncertainies E. Jeannot 6/16

Uncertainties due to the users

Users:• Submit jobs/requests randomly• May behave with some malignity (voluntarily or not)

DOS attack Desktop grid : give wrong answer

Managing uncertainies E. Jeannot 7/16

Rationale

As resource management algorithms

cope with heterogeneity or distribution,

they also must cope with uncertainty

Managing uncertainies E. Jeannot 8/16

Ways to cope with uncertainty

Proactive methods (static)• Redundancy• Duplication

Reactive methods (dynamic)• Check-point restart• migration

Mixed (provide a static solution and adapt it dynamically)

Managing uncertainies E. Jeannot 9/16

Functional Goals

Different kinds of uncertainties lead to different desired behavior

Reliability, fault-tolerance: • Hardware failure• Software failure

Robustness:• Hardware perf. degradation• Software unpredictability

Correctness:• Bad usage

Etc…

Managing uncertainies E. Jeannot 10/16

Multi-criteria approach

The old good metrics are still valid:• Makespan• Load-balance• Response time• Lateness• Etc.

Most of the time these metrics are contradictory with the other one.

Need of a multi-criteria approach (ex: makespan/reliability).

Open issue (1)

Gather traces:

- What is the behavior of users/programs/infrastructure?

- Ease the extraction of useful information

- Ensure generality

Managing uncertainies E. Jeannot 11/16

Managing uncertainies E. Jeannot 12/16

Open issues (2)

Model the uncertainty• Trace the behavior• Analyze • Provide modeling

Managing uncertainies E. Jeannot 13/16

Carefully define metrics

Mapping a goal into a metric is not trivial:

Ex: robustness• Intuitive notion• Many metrics (one per paper)• Question: relation between these metrics.

Managing uncertainies E. Jeannot 14/16

Open issues (4)

Provide resource management (scheduling) algorithms• Mono-criteria/Multi-criteria• Static/dynamic/mixed• Works well in the worst case/on the avarage• Etc.

Managing uncertainies E. Jeannot 15/16

Open issues (5)

Static vs. Dynamic?

Each approach: advantages and drawback.

• Dynamic (ex. check-point-restart): time costly, but handle almost every cases • Static (ex. duplication): resource costly, can provide some guarantee.

What is the best approach depends on the problem.• Is the mixed approach always possible/profitable?

Managing uncertainies E. Jeannot 16/16

Open issues : real scale experimentation

Provide detection mechanisms• Failure• Malignity• Resource usage• Correctness• Etc.

Program and test solutions:• Real-scale (grid’5000, DAS-3)• Simulation• Emulation?

Validation of the models.

Big picture

Managing uncertainies E. Jeannot 17/16

Today

Kick-off :• ALEAE : a two year INRIA funded project (20 k€/year)• Presentation on each item• Technical presentation on sub-item• Work plan :

Other/next meetings Visit/exchange Mission Post-doc Synergies between teams

• Important : I am moving to INRIA Bordeaux.

Managing uncertainies E. Jeannot 18/18

Managing uncertainies E. Jeannot 19/16

Conclusion

Grid environments are full of uncertainties

These uncertainties come from different factors

Handling them is difficult (especially with the traditional criteria)

What is the best way to tackle this problem (dynamic/static/mixed), is of crucial interest.

The goal of ALEAE is to tackle such issues.