(grid2010)cloud auto-scaling with deadline and budget constraints

Post on 27-Oct-2014

110 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

http://www.cs.virginia.edu/~mm5bw/papers/CloudAutoScaling.pdf

TRANSCRIPT

Ming Mao, Jie Li, Marty Humphrey

eScience Group

CS Department, University of Virginia

Grid 2010 – Oct 27, 2010

A fast growing computing platform IDC - Cloud spending increases 27.4% a year to $56

billion (compared 5% a year of traditional IT)

$16.5 billion (2009) -> $55.5 billion (2014) src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast

Two most quoted benefits

Scalable computing and storage

Reduced cost

Concerns

Security, availability, cost management, integration interoperability, etc.

Q1. Cost – the most important factor in practice?

Q2. Moving into Cloud == Reduced Cost ?

54.00%

63.90%

64.60%

67.00%

68.50%

75.30%

77.70%

77.90%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Seems like the way of future

Sharing systems with partners simpler

Alwasys offers latest functionality

Requires less in-house IT staff, costs

Encourages standard systems

Monthly payments

Easy/fast to deply to end-users

Pay only for what you use

Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009

Rate the benefits commonly ascribed to the cloud on-demand model

72.90%

78.30%

79.20%

81.00%

82.10%

84.50%

86.00%

87.80%

88.60%

91.60%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Have local presence, can come to my offices

Are a technology and business model innovator

Offer both on-premise and public cloud services

Support many of my IT needes

Allow managing on-premise & cloud together

Understand my business and industry

Provide a complete solution

Option to move cloud offerings back on premise

Offer Service Level Agreements

Offer competitive pricing

Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009

How important is it that Cloud service providers...

Resource utilization information based triggers (e.g.

AWS auto-scaling, RightScale, enStratus, Scalr, etc)

Multiple instance types

Current billing models Full hour billing

Non-ignorable instance acquisition time 7-15 min in Windows Azure

More specific performance goals

Budget awareness (e.g. dollars/month, dollars/job)

Deadline

(Job finish time)

Cost

Problem Statement – how to enable cloud

applications to finish all the submitted jobs

before user specified deadline with as little

money as possible using auto-scaling.

CloudApplication

Users

Job

Cloud Server

Workload are non-dependent jobs submitted in the job queue

FCFS manner and fairly distributed

Different classes of jobs

Same performance goal (e.g.1 hour deadline)

VM instances take time to startup

ijinijiViI idiV,i jt

Key variables used in the model

Workload

Computing Power of Instance

Running Instance

Pending Instance

( , )j jW J n

, ( )

( , )

i

j

i j

j type I jj

D nP J

t n

( )

, ( )

( ( ))( , )i

i

type I i j

i j

j type I jj

D d s nP J

t n

iI

Scale up

Sufficient budget

Insufficient budget

Scale down

'iiP W P ( ')( )

itype IiMin c

( ')iMax P ( ') ( )i itype I type Ii ic C c

i siP P W

Workload Required Computing Power

1

2

3

21

: 60 10 10 40

: 60 5 20 35

: 60 20 5 35

'

j x

j y

j z

P W I I

1

2 1 2 3

3

1 2 3

: 10 10 10 45

: ' 5 ' 20 ' 10 35

: 20 5 10 35

'

j x

j n n n y

j z

V V V P

1 1 2 2 3 3( ' ' ')Min c n c n c n

1 21 1 2 2 3 3 ( ) ( )' ' ' type I type Ic n c n c n c c C where

Cloud Cruise Control

Decider

&

Monitor RepositoryVM

Manager

Config

VM instancesHistorical

Data

workload

dequeue

enqueue

update update

+ , –

vm plan

vm info

( ')( )itype Ii

Min c 'jjP W P admin

users

dynamic

configuration

notify

Mix

Avg 30 jobs/hour

STD 5 jobs/hour

Computing

Intensive

Avg 30 jobs/hour

STD 5 jobs/hour

IO Intensive

Avg 30 jobs/hour

STD 5 jobs/hour

General

0.085$/hour

Delay 600s

Average 300s

STD 50s

Average 300s

STD 50s

Average 300s

STD 50s

High-CPU

0.17$/hour

Delay 720s

Average 210s

STD 25s

Average 75s

STD 15s

Average 300s

STD 50s

High-IO

0.17$/hour

Delay 720s

Average 210s

STD 25s

Average 300s

STD 50s

Average 75s

STD 15s

Workload & VM simulation parameters

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60 70 80

Utilization (%)Response (sec)

Time (hour)

Stable Worload & Changing Deadline

utilization deadline avg max min

0

50

100

150

200

250

300

350

0

500

1000

1500

2000

2500

3000

3500

4000

0 10 20 30 40 50 60 70 80

Worload (job/h)Response (sec)

Time (hour)

Changing Workload & Fixed Deadline

deadline avg max min workload

VM Types Total Cost ($)

% more than optimal

Choice #1 General 98.52$ (43%)

Choice #2 High-CPU 128.86$ (87%)

Choice #3 High-IO 129.71$ (88%)

Choice #4 General, High-CPU, High-IO 78.62$ (14%)

Optimal General, High-CPU, High-IO 68.85$

MODIS 200X – Year Terra & Aqua – Satellite

(X - Y) – Day X to day Y 15 images / day

Moderate scale test (up to 20 instances)

Large Scale test (up to 90 instances)

* C.H. – computing hour 1C.H. = 0.12$ in Windows Azure

1hour deadline 2hour deadline 3hour deadline

Terra 2004(10-12)

Total 45 jobs

4 C.H.* or 0.48$

18 min late 8 min early 20 min early

9 C.H.or 1.08$ 6 C.H or 0.72$ 5 C.H.or 0.6$

Aqua 2008(30-32)

Total 45 jobs

4 C.H. or 0.48$

15min late 20 min early 29 min early

10 C.H or 1.2$ 7 C.H.or 0.84$ 5 C.H.or 0.6$

2 hour deadline 4 hour deadline

Terra & Aqua 2006(1-75)

Total 1125 jobs

93 C.H. or 11.16$

20min late

170 C.H. or 20.4$

6 min early

132 C.H. or 15.84$

Terra & Aqua 2006(1-150)

Total 2250 jobs

185 C.H. or 22.2$

Admission Denied 22 min early

243 C.H. or 29.16$

Test: Terra & Aqua 2006(1-75) - total 1125 jobs

6min early

theoretical cost - 93 C.H. or 11.16$

actual cost - 132 C.H. or 15.84$

0 1 2 3 4 5

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

Time (hour)

Inst

an

ce N

um

ber

Instance Acquisition and Release

Released Acquiring Ready

Conclusions

More cost-efficient than fixed-size instance choice

VM startup delay can affect hugely in practice

Future works

More general cloud application model

Multiple job classes

Consider other instance types (e.g. spot instances &

reserved instances)

Data transfer performance and storage cost

top related