Transcript
Page 1: Autonomic SLA-driven Provisioning for Cloud Applications

Autonomic SLA-driven Provisioning for Cloud Applications

Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer

CCGRID 2011, May 23-26 2011, New Port Beach, CA, USA

[email protected] - EPFL

Page 2: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

2 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

Page 3: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

Cloud Apps – Issue #1 : Placement

3 EPFL – LSIR - Nicolas Bonvin

C1C1 C2C2 C3C3 C4C4

VM1 VM2 VM3

Page 4: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

4 EPFL – LSIR - Nicolas Bonvin

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

Page 5: Autonomic SLA-driven Provisioning for Cloud Applications

● A distributed, component-based application running on an elastic infrastructure

● Performance of C1, C2 and C3 is probably less than C4● No info on other VMs colocated on same server !

Cloud Apps – Issue #1 : Placement

5 EPFL – LSIR - Nicolas Bonvin

No control on placement

C3C3 C4C4

VM2 VM3

Server 1 Server 2

C1C1 C2C2

VM1

Page 6: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

Cloud Apps – Issue #2 : Unstability

6 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 100 ms 100 ms 100 ms

Page 7: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...

Cloud Apps – Issue #2 : Unstability

7 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

100 ms 140 ms 100 ms 100 ms

Page 8: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded

Cloud Apps – Issue #2 : Unstability

8 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms 100 ms

Page 9: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...

Cloud Apps – Issue #2 : Unstability

9 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

130 ms 140 ms 100 ms infinity

Page 10: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

10 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

Page 11: Autonomic SLA-driven Provisioning for Cloud Applications

● Load-balanced trafic to 4 identical components on 4 identical VMs

– VM performance can vary up to a ratio 4 ! [Dej2009]

● Physical server, Hypervisor, Storage, ...● Component overloaded● Component bug, crash, deadlock, ...● Failure of C1 on VM4 -> load is rebalanced

Cloud Apps – Issue #2 : Unstability

11 EPFL – LSIR - Nicolas Bonvin

C1C1 C1C1 C1C1 C1C1

VM1 VM2 VM3 VM4

140 ms 150 ms 130 ms infinity

Application should react early !

Page 12: Autonomic SLA-driven Provisioning for Cloud Applications

● Build for failures

– Do not trust the underlying infrastructure

– Do not trust your components either !

● Components should adapt to the changing conditions

– Quickly

– Automatically

– e.g. by replacing a wonky VM by a new one

Cloud Apps – Overview

12 EPFL – LSIR - Nicolas Bonvin

Page 13: Autonomic SLA-driven Provisioning for Cloud Applications

Scarce: a framework to build scalable cloud applications

Page 14: Autonomic SLA-driven Provisioning for Cloud Applications

Architecture Overview

14 EPFL – LSIR - Nicolas Bonvin

Agent

Server

GOSSIPING + BROADCAST

Agent

A

B

E

● An agent on each server / VM

– starts/stops/monitors the components

– Takes decisions on behalf of the components

● An agent communicates with other agents

– Routing table

– Status of the server (resources usage)

Agent

Agent

Agent

Agent

Page 15: Autonomic SLA-driven Provisioning for Cloud Applications

An economic approach

15 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

Page 16: Autonomic SLA-driven Provisioning for Cloud Applications

An economic approach

16 EPFL – LSIR - Nicolas Bonvin

● Time is split into epochs (no synchronization between servers)● Servers charge a virtual rent for hosting a component according to

– Current resource usage (I/O, CPU, ...) of the server

– Technical factors (HW, connectivity, ...)

– Non-technical factors (country stability, ....)

● Components

– Pay virtual rent at each epoch

– Gain virtual money by processing requests

– Take decisions based on balance ( = gain – rent )

● Replicate, migrate, suicide, stay

● Virtual rents are updated by gossiping (no centralized board)

Page 17: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (i)

17 EPFL – LSIR - Nicolas Bonvin

● The rent of a server is different for each component !

Page 18: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (ii)

18 EPFL – LSIR - Nicolas Bonvin

● VM1 and VM2 have an « identical » resources usage : 45%● Server rent = server's resources usage with component's weights

– Rent for C1 @ VM1 > rent for C1 @ VM2

C1C1CPU : 30%I/O : 5%

VM1

CPU : 70%I/O : 20%

Multiplexing of server resources

VM2

CPU : 25%I/O : 65%

?

Page 19: Autonomic SLA-driven Provisioning for Cloud Applications

Economic model (iii)

19 EPFL – LSIR - Nicolas Bonvin

● Choosing a candidate server j during replication/migration of a component i

– netbenefit maximization

● 2 optimization goals :

– high-availability by geographical diversity of replicas

– low latency by grouping related components

● gj : weight related to the proximity of the server location to the geographical distribution of the client requests to the component

● Si is the set of server hosting a replica of component i

Page 20: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (i)

20 EPFL – LSIR - Nicolas Bonvin

● Each component has its own SLA constraints● SLA derived directly from entry components

● Resp. Time = Service Time + max (Resp. Time of Dependencies)

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

Page 21: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (ii)

21 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children● Parent j sends its performance constraints (e.g. response time upper

bound) to its dependencies D(j) :

● Child i computes its own performance constraints :

● : group of constraints sent by the replicas of the parent g

Page 22: Autonomic SLA-driven Provisioning for Cloud Applications

SLA Performance Guarantees (iii)

22 EPFL – LSIR - Nicolas Bonvin

● SLA propagation from parents to children

Page 23: Autonomic SLA-driven Provisioning for Cloud Applications

Automatic Provisioning

23 EPFL – LSIR - Nicolas Bonvin

● Usage of allocated resources is maximized :

– autonomic migration / replication / suicide of components

– not enough to ensure end-to-end response time

● Cloud resources managed by framework via cloud API

● Each individual component has to satisfy its own SLA

– SLA easily met -> decrease resources (scale down)

– SLA not met -> increase resources (scale up, scale out)

Page 24: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptivity to slow servers

24 EPFL – LSIR - Nicolas Bonvin

● Each component keeps statistics about its children

– e.g. 95th perc. response time

● A routing coefficient is computed for each child at each epoch

– Send more requests to more performant children

Page 25: Autonomic SLA-driven Provisioning for Cloud Applications

Evaluation

Page 26: Autonomic SLA-driven Provisioning for Cloud Applications

Evaluation: Setup

26 EPFL – LSIR - Nicolas Bonvin

● 5 components, mostly CPU-intensive (wc >> wm,wn,wd)

● 8 8-cores servers (Intel Core i7 920, 2.67 GHz, 8GB, Linux 2.6.32-trunk-amd64)

● d=0, C=110, k =10000, xs* = 25%

C3C3

C1SLA : 500ms

C1SLA : 500ms

C2C2

C5C5

C4C4

Page 27: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Varying Load (i)

27 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Page 28: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Varying Load (ii)

28 EPFL – LSIR - Nicolas Bonvin

● 5 rps to 60 rps at minute 8, step 5 rps/min● Static setup : 2 servers with 2 cores

Page 29: Autonomic SLA-driven Provisioning for Cloud Applications

Adaptation to Slow Server

29 EPFL – LSIR - Nicolas Bonvin

● Max 2 cores/server, 25 rps● At minute 4, a server gets slower (200 ms delay)

Page 30: Autonomic SLA-driven Provisioning for Cloud Applications

Scalability

30 EPFL – LSIR - Nicolas Bonvin

● Add 5 rps

per minute until 150 rps● Max 6 cores/server

Page 31: Autonomic SLA-driven Provisioning for Cloud Applications

Conclusion

Page 32: Autonomic SLA-driven Provisioning for Cloud Applications

Conclusion

32 EPFL – LSIR - Nicolas Bonvin

● Framework for building cloud applications● Elasticity : add/remove resources ● High Availability : software, hardware, network failures● Scalability : growing load, peaks, scaling down, ...

– Quick replication of busy components

● Load Balancing : load has to be shared by all available servers

– Replication of busy components

– Migration of less busy components

– Reach equilibrium when load is stable

● SLA performance guarantees

– Automatic provisioning

● No synchronization, fully decentralized

Page 33: Autonomic SLA-driven Provisioning for Cloud Applications

Thank you !


Top Related