FuzzySelf-LearningControllersforElasticityManagementinDynamicCloudArchitectures
Pooyan JamshidiImperial College [email protected]
Invited talk at Sharif University of Technology12 April 2016
Motivation
~50%=wastedhardware
Actual traffic
Typical weekly traffic to Web-based applications (e.g., Amazon.com)
Motivation
Problem1:~75%wastedcapacityActual
demand
Problem2:customerlost
Traffic in an unexpected burst in requests (e.g. end of year traffic to Amazon.com)
Motivation
Really like this??
Auto-scaling enables you to realize this ideal on-demand provisioningTime
Demand
?Enacting change in the Cloud resources are notreal-time
Motivation
Capacity we can provision with Auto-Scaling
A realistic figure of dynamic provisioning
ResearchChallenges
• Challenge 1.Parameters’valuepredictionaheadoftime.
• Challenge2.Qualitativespecificationofthresholds.
• Challenge3.Robustcontrolofuncertaintyinmeasurementdata.
Predictablevs.UnpredictableDemand
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
600
ResearchChallenges
• Challenge 1.Parameters’valuepredictionaheadoftime.
• Challenge2.Qualitativespecificationofthresholds.
• Challenge3.Robustcontrolofuncertaintyinmeasurementdata.
AnExampleofAuto-scalingRule These quantitative values are required to be determined by the userÞ requires deep
knowledge of application (CPU, memory, thresholds)
Þ requires performance modeling expertise (when and how to scale)
Þ A unified opinion of user(s) is required
Amazon auto scaling Microsoft Azure Watch
9
Microsoft Azure Auto-scaling Application Block
ResearchChallenges
• Challenge 1.Parameters’valuepredictionaheadoftime.
• Challenge2.Qualitativespecificationofthresholds.
• Challenge3.Robustcontrolofuncertaintyinmeasurementdata.
SourcesofUncertaintyinElasticSoftwareP. Jamshidi, C. Pahl, N. Mendonca, “Managing Uncertainty in Autonomic Cloud Elasticity Controllers”, IEEE Cloud Computing, 2016.
P. Jamshidi, C. Pahl, “Software Architecture for the Cloud–a Roadmap towards Control-Theoretic, Model-Based Cloud Architecture”, LNCS, 2015.
Aconcreteexampleofuncertaintyinthecloud
Uncertainty related to enactment latency: The same scaling action (adding/removing a VM with precisely the same size) took different time to be enacted on the cloud platform (here is Microsoft Azure) at different points and this difference were significant (up to couple of minutes).The enactment latency would be also different on different cloud platforms.
Goal!•Taketheburdenawayfromtheuser
─ usersspecifiesthethresholdsthroughqualitativelinguistics─ theauto-scalingcontrollershouldbefullyresponsibleforscalingdecisions─ theauto-scalingshouldberobust againstuncertainty
Ø Offlinebenchmarking
Ø Trial-and-error
Ø Expertknowledge
Costly and not systematic
A. Gandhi, P. Dube, A. Karve, A. Kochut, L. Zhang, Adaptive, “Model-driven Autoscaling for Cloud Applications”, ICAC’14
arrivalrate(req/s)
95%Resp.time(m
s)
400ms
60req/s
RobusT2Scale:ArchitecturalOverviewRobusT2Scale Initial setting +
elasticity rules + response-time SLA
environment monitoring
application monitoring
scaling actions
Fuzzy ReasoningUsers
Prediction/Smoothing
InternalDetailsofRobusT2ScaleCode:https://github.com/pooyanjamshidi/RobusT2Scale
Whywedecidedtousetype-2fuzzylogic?
0 0.5 1 1.5 2 2.5 30
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Regionofdefinitesatisfaction
RegionofdefinitedissatisfactionRegionof
uncertainsatisfaction
Performance Index
Poss
ibili
ty
Performance IndexPo
ssib
ility
words can mean different things to different people
Different users often recommend different elasticity policies
0 0.5 1 1.5 2 2.5 30
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Type-2 MF
Type-1 MF
Howwedesignedthefuzzycontroller?
- Thefuzzylogiccontrolleriscompletelydefinedbyits“membershipfunctions”and“fuzzyrules”.
- Knowledgeelicitationthroughasurvey of10 experts.
SurveyprocessingandfuzzyMFconstruction
Workload
Response time
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uM
embe
rshi
p gr
ade
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2u
Mem
bers
hip
grad
e
=>
=>
UMF
LMF
Embedded
FOU
mean
sd
FuzzyruleelicitationRule(𝒍)
Antecedents Consequent
𝒄𝒂𝒗𝒈𝒍 Workload Response-
timeNormal(-2)
Effort(-1)
MediumEffort(0)
HighEffort(+1)
MaximumEffort(+2)
1 Verylow Instantaneous 7 2 1 0 0 -1.62 Verylow Fast 5 4 1 0 0 -1.43 Verylow Medium 0 2 6 2 0 04 Verylow Slow 0 0 4 6 0 0.65 Verylow Veryslow 0 0 0 6 4 1.46 Low Instantaneous 5 3 2 0 0 -1.37 Low Fast 2 7 1 0 0 -1.18 Low Medium 0 1 5 3 1 0.49 Low Slow 0 0 1 8 1 110 Low Veryslow 0 0 0 4 6 1.611 Medium Instantaneous 6 4 0 0 0 -1.612 Medium Fast 2 5 3 0 0 -0.913 Medium Medium 0 0 5 4 1 0.614 Medium Slow 0 0 1 7 2 1.115 Medium Veryslow 0 0 1 3 6 1.516 High Instantaneous 8 2 0 0 0 -1.817 High Fast 4 6 0 0 0 -1.418 High Medium 0 1 5 3 1 0.419 High Slow 0 0 1 7 2 1.120 High Veryslow 0 0 0 6 4 1.421 Veryhigh Instantaneous 9 1 0 0 0 -1.922 Veryhigh Fast 3 6 1 0 0 -1.223 Veryhigh Medium 0 1 4 4 1 0.524 Veryhigh Slow 0 0 1 8 1 125 Veryhigh Veryslow 0 0 0 4 6 1.6
Rule()
Antecedents ConsequentWorkload
Response-time
-2 -1 0 +1 +2
12 Medium Fast 2 5 3 0 0 -0.9
10 experts’ responses
𝑅": IF (the workload (𝑥%) is 𝐹'(), AND the response-time (𝑥*) is 𝐺'(,), THEN (add/remove 𝑐./0" instances).
𝑐./0" =∑ 𝑤4"×𝐶7849%
∑ 𝑤4"7849%
Goal: pre-computations of costly calculations to make a runtime efficient elasticity reasoning based on fuzzy inference
ElasticityReasoning@Runtime
Liang, Q., Mendel, J. M. (2000). Interval type-2 fuzzy logic systems: theory and design. Fuzzy Systems, IEEE Transactions on, 8(5), 535-550.
Scaling Actions
Monitoring Data
Fuzzification
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
𝑀
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2212
0.0000
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
u
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x2
uMonitoring data
Workload
Response time
InferenceMechanism
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.5954
0.3797
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.95680.9377
OutputProcessing
ControlSurface𝑦", 𝑦=
ToolChainArchitecture
ExperimentalSetting:ProcessView
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
600
0 50 1000
500
1000
0 50 1000
500
1000
EstimationErrorsw.r.t.WorkloadPatterns
0 10 20 30 40 50 60 70 80 90 100-500
0
500
1000
1500
2000
Time (seconds)
Num
ber o
f hits
Original databetta=0.10, gamma=0.94, rmse=308.1565, rrse=0.79703betta=0.27, gamma=0.94, rmse=209.7852, rrse=0.54504betta=0.80, gamma=0.94, rmse=272.6285, rrse=0.70858
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Big spike Dual phase Large variations Quickly varying Slowly varying Steep tri phase
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
6000 50 1000
500
1000
0 50 1000
500
1000
Root
Relat
ive
Squa
red
Erro
r
WorkloadPredictionandItsAccuracy
0 20 40 60 80 100 120Time (Seconds)
150
200
250
300
350
400
450
500
Num
ber o
f hits
Forecasting with double exponential smoothingObserved dataSmoothed dataForecast
0.120.9
0.14
0.2
0.16
0.92
0.18
0.4 0.94
0.2
alpha gamma
Root mean squared error versus alpha
RMSE
0.22
0.960.6
0.24
0.98
0.26
0.81
EffectivenessofRobusT2ScaleSUT Criteria Bigspike Dualphase
Largevariations
Quicklyvarying
Slowlyvarying
Steeptriphase
RobusT2Scale973ms 537ms 509ms 451ms 423ms 498ms
3.2 3.8 5.1 5.3 3.7 3.9
Overprovisioning354ms 411ms 395ms 446ms 371ms 491ms
6 6 6 6 6 6
Underprovisioning
1465ms 1832ms 1789ms 1594ms 1898ms 2194ms
2 2 2 2 2 2
SLA:𝒓𝒕𝟗𝟓 ≤ 𝟔𝟎𝟎𝒎𝒔Forevery10scontrolinterval
•RobusT2Scale is superior to under-provisioning in terms of guaranteeing the SLA and does not require excessive resources•RobusT2Scale is superior to over-provisioning in terms of guaranteeing required resources while guaranteeing the SLA
RobustnessofRobusT2Scale
0
0.02
0.04
0.06
0.08
0.1
alpha=0.1 alpha=0.5 alpha=0.9 alpha=1.0
Root
Mea
n Sq
uare
Err
or
Noiselevel:10%
Self-LearningController
Initial Work(SEMAS paper)
Current Work
Updating Kin MAPE-K@ Runtime(QoSA16, IEEE Cloud)
Design-timeAssistance
Multi-cloud
AModel-FreeReinforcementLearningApproach
S1
S2
S3
S4
S5
a1
a2
a3
a4
a5
a6
Environment
RLAgent
!"0state
#"$%0reward/punishment&"$%0 PolicyPolicyPolicy
Calibrate
EstablishDe
term
ine
Environment
RLAgent
!"0state#"$%0
reward/punishment&"$%0 PolicyPolicyPolicy
Derive
Establish
Determ
ine
Value0Function
(!) ())
Fuzzifier
InferenceEngine
DefuzzifierRulebase
FuzzyQ-learning
CloudApplicationMonitoring Actuator
CloudPlatform
FuzzyLogicController
KnowledgeLearning
Autono
micCon
troller
𝑟𝑡𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
systemstate systemgoal
FQL4KE:LogicalArchitecture
FuzzyQ-LearningAlgorithm 1 : Fuzzy Q-LearningRequire: �, ⌘, ✏
1: Initialize q-values:q[i, j] = 0, 1 < i < N , 1 < j < J
2: Select an action for each fired rule:ai
= argmaxk
q[i, k] with probability 1� ✏ . Eq. 5ai
= random{ak
, k = 1, 2, · · · , J} with probability ✏3: Calculate the control action by the fuzzy controller:
a =
PN
i=1 µi
(x)⇥ ai
, . Eq. 1where ↵
i
(s) is the firing level of the rule i4: Approximate the Q function from the current
q-values and the firing level of the rules:Q(s(t), a) =
PN
i=1 ↵i
(s) ⇥ q[i, ai
],where Q(s(t), a) is the value of the Q function forthe state current state s(t) in iteration t and the action a
5: Take action a and let system goes to the next state s(t+1).6: Observe the reinforcement signal, r(t + 1)
and compute the value for the new state:V (s(t+ 1)) =
PN
i=1 ↵i
(s(t+ 1)).maxk
(q[i, qk
]).7: Calculate the error signal:
�Q = r(t+ 1) + � ⇥ Vt
(s(t+ 1))�Q(s(t), a), . Eq. 4where � is a discount factor
8: Update q-values:q[i, a
i
] = q[i, ai
] + ⌘ ·�Q · ↵i
(s(t)), . Eq. 4where ⌘ is a learning rate
9: Repeat the process for the new state until it converges
is formally defined as:
Q⇡
(s, a) = E⇡
{1X
k=0
�krt+k+1}, (3)
where E⇡
{.} is the expectation function under policy ⇡. Whenan appropriate policy is found, the RL problem at hand issolved. Q-learning is a technique that does not require anyspecific policy in order to evaluate Q(s, a), therefore:
Q(st
, at
) Q(st
, at
)+⌘[rt+1+�max
a
Q(st+1, a)�Q(s
t
, at
)],
(4)where � is the learning rate. In this case, the policy adaptationcan be achieved by selecting a random action with probability ✏and an action that maximizes the Q function in the current statewith probability 1� ✏, note that the value of ✏ is determinedby the exploitation/exploration strategy (cf. V-A):
a(s) = argmax
k
Q(s, k) (5)
The fuzzy Q-learning algorithm is summarized in Algorithm1. In the case of our running example, the state space is finite(i.e., 9 states as the full combination of 3 ⇥ 3 membershipfunctions for fuzzy variables w and rt) and our controllerhas to choose a scaling action among 5 possible actions{�2,�1, 0,+1,+2}. However, the design methodology thatwe describe is general. Note that the convergence is detectedwhen the change in the consequent functions is negligible ineach learning loop.
D. FQL4KE for Dynamic Resource Allocation
In this work, for simplicity, only one instance of the fuzzycontroller is integrated. Note that in the case of multiplecontrollers is also possible but due to the intricacies of updatinga central Q table, we consider this as a natural future extensionof this work for the problem areas that requires coordinationbetween several controllers, see [9].
Reward function. The controller receives the current valuesof w and rt that correspond to the state of the system, s(t) (cf.Step 4 in Algorithm 1). The control signal sa represents theaction a that the controller take at each loop. We define thereward signal r(t) based on three criteria: (i) numbers of thedesired response time violations, (ii) the amount of resourceacquired, and (iii) throughput, as follows:
r(t) = U(t)� U(t� 1), (6)
where U(t) is the utility value of the system at time t. Hence,if a controlling action leads to an increased utility, it meansthat the action is appropriate. Otherwise, if the reward is closeto zero, it implies that the action is not effective. A negativereward (punishment) warns that the situation becomes worseafter taking the action. The utility function is defined as:
U(t) = w1 ·th(t)
thmax
+w2 ·(1�vm(t)
vmmax
)+w3 ·(1�H(t)) (7)
H(t) =
8><
>:
(rt(t)�rtdes)rtdes
rtdes
rt(t) 2 · rtdes
1 rt(t) � 2 · rtdes
0 rt(t) rtdes
where th(t), vm(t) and rt(t) are throughput, number of workerroles and response time of the system, respectively. w1,w2 andw3 are their corresponding weights determining their relativeimportance in the utility function. In order to aggregate theindividual criteria, we normalized them depending on whetherthey should be maximized or minimized.
Knowledge base update. FQL4KE starts with controllingthe allocation of resources with no a priori knowledge. Afterenough explorations, the consequents of the rules can bedetermined by selecting actions that correspond to the highestq-value in each row of the Q-table. Although FQL4KE doesnot rely on design-time knowledge, if even partial knowledgeis available (i.e., operator of the system is confident withproviding some of the elasticity policies) or there exists dataregarding performance of the application, FQL4KE can exploitsuch knowledge by initializing q-values (cf. step 1 in Algorithm1) This implies a quicker learning convergence.
IV. IMPLEMENTATION
We implemented prototypes of FQL4KE on Microsoft Azureand OpenStack. As illustrated in Figure 4, the Azure prototypecomprises of 3 components integrated according to Figure 6:
i A learning component FQL implemented in Matlab. 1
1code is available at https://github.com/pooyanjamshidi/Fuzzy-Q-Learning
Low Medium High
Workload
1
0α β γ δ
Bad OK Good
Response Time
1
0λ μ ν
Algorithm 1 : Fuzzy Q-LearningRequire: �, ⌘, ✏
1: Initialize q-values:q[i, j] = 0, 1 < i < N , 1 < j < J
2: Select an action for each fired rule:ai
= argmaxk
q[i, k] with probability 1� ✏ . Eq. 5ai
= random{ak
, k = 1, 2, · · · , J} with probability ✏3: Calculate the control action by the fuzzy controller:
a =
PN
i=1 µi
(x)⇥ ai
, . Eq. 1where ↵
i
(s) is the firing level of the rule i4: Approximate the Q function from the current
q-values and the firing level of the rules:Q(s(t), a) =
PN
i=1 ↵i
(s) ⇥ q[i, ai
],where Q(s(t), a) is the value of the Q function forthe state current state s(t) in iteration t and the action a
5: Take action a and let system goes to the next state s(t+1).6: Observe the reinforcement signal, r(t + 1)
and compute the value for the new state:V (s(t+ 1)) =
PN
i=1 ↵i
(s(t+ 1)).maxk
(q[i, qk
]).7: Calculate the error signal:
�Q = r(t+ 1) + � ⇥ Vt
(s(t+ 1))�Q(s(t), a), . Eq. 4where � is a discount factor
8: Update q-values:q[i, a
i
] = q[i, ai
] + ⌘ ·�Q · ↵i
(s(t)), . Eq. 4where ⌘ is a learning rate
9: Repeat the process for the new state until it converges
is formally defined as:
Q⇡
(s, a) = E⇡
{1X
k=0
�krt+k+1}, (3)
where E⇡
{.} is the expectation function under policy ⇡. Whenan appropriate policy is found, the RL problem at hand issolved. Q-learning is a technique that does not require anyspecific policy in order to evaluate Q(s, a), therefore:
Q(st
, at
) Q(st
, at
)+⌘[rt+1+�max
a
Q(st+1, a)�Q(s
t
, at
)],
(4)where � is the learning rate. In this case, the policy adaptationcan be achieved by selecting a random action with probability ✏and an action that maximizes the Q function in the current statewith probability 1� ✏, note that the value of ✏ is determinedby the exploitation/exploration strategy (cf. V-A):
a(s) = argmax
k
Q(s, k) (5)
The fuzzy Q-learning algorithm is summarized in Algorithm1. In the case of our running example, the state space is finite(i.e., 9 states as the full combination of 3 ⇥ 3 membershipfunctions for fuzzy variables w and rt) and our controllerhas to choose a scaling action among 5 possible actions{�2,�1, 0,+1,+2}. However, the design methodology thatwe describe is general. Note that the convergence is detectedwhen the change in the consequent functions is negligible ineach learning loop.
D. FQL4KE for Dynamic Resource Allocation
In this work, for simplicity, only one instance of the fuzzycontroller is integrated. Note that in the case of multiplecontrollers is also possible but due to the intricacies of updatinga central Q table, we consider this as a natural future extensionof this work for the problem areas that requires coordinationbetween several controllers, see [9].
Reward function. The controller receives the current valuesof w and rt that correspond to the state of the system, s(t) (cf.Step 4 in Algorithm 1). The control signal sa represents theaction a that the controller take at each loop. We define thereward signal r(t) based on three criteria: (i) numbers of thedesired response time violations, (ii) the amount of resourceacquired, and (iii) throughput, as follows:
r(t) = U(t)� U(t� 1), (6)
where U(t) is the utility value of the system at time t. Hence,if a controlling action leads to an increased utility, it meansthat the action is appropriate. Otherwise, if the reward is closeto zero, it implies that the action is not effective. A negativereward (punishment) warns that the situation becomes worseafter taking the action. The utility function is defined as:
U(t) = w1 ·th(t)
thmax
+w2 ·(1�vm(t)
vmmax
)+w3 ·(1�H(t)) (7)
H(t) =
8><
>:
(rt(t)�rtdes)rtdes
rtdes
rt(t) 2 · rtdes
1 rt(t) � 2 · rtdes
0 rt(t) rtdes
where th(t), vm(t) and rt(t) are throughput, number of workerroles and response time of the system, respectively. w1,w2 andw3 are their corresponding weights determining their relativeimportance in the utility function. In order to aggregate theindividual criteria, we normalized them depending on whetherthey should be maximized or minimized.
Knowledge base update. FQL4KE starts with controllingthe allocation of resources with no a priori knowledge. Afterenough explorations, the consequents of the rules can bedetermined by selecting actions that correspond to the highestq-value in each row of the Q-table. Although FQL4KE doesnot rely on design-time knowledge, if even partial knowledgeis available (i.e., operator of the system is confident withproviding some of the elasticity policies) or there exists dataregarding performance of the application, FQL4KE can exploitsuch knowledge by initializing q-values (cf. step 1 in Algorithm1) This implies a quicker learning convergence.
IV. IMPLEMENTATION
We implemented prototypes of FQL4KE on Microsoft Azureand OpenStack. As illustrated in Figure 4, the Azure prototypecomprises of 3 components integrated according to Figure 6:
i A learning component FQL implemented in Matlab. 1
1code is available at https://github.com/pooyanjamshidi/Fuzzy-Q-Learning
Algorithm 1 : Fuzzy Q-LearningRequire: �, ⌘, ✏
1: Initialize q-values:q[i, j] = 0, 1 < i < N , 1 < j < J
2: Select an action for each fired rule:ai
= argmaxk
q[i, k] with probability 1� ✏ . Eq. 5ai
= random{ak
, k = 1, 2, · · · , J} with probability ✏3: Calculate the control action by the fuzzy controller:
a =
PN
i=1 µi
(x)⇥ ai
, . Eq. 1where ↵
i
(s) is the firing level of the rule i4: Approximate the Q function from the current
q-values and the firing level of the rules:Q(s(t), a) =
PN
i=1 ↵i
(s) ⇥ q[i, ai
],where Q(s(t), a) is the value of the Q function forthe state current state s(t) in iteration t and the action a
5: Take action a and let system goes to the next state s(t+1).6: Observe the reinforcement signal, r(t + 1)
and compute the value for the new state:V (s(t+ 1)) =
PN
i=1 ↵i
(s(t+ 1)).maxk
(q[i, qk
]).7: Calculate the error signal:
�Q = r(t+ 1) + � ⇥ Vt
(s(t+ 1))�Q(s(t), a), . Eq. 4where � is a discount factor
8: Update q-values:q[i, a
i
] = q[i, ai
] + ⌘ ·�Q · ↵i
(s(t)), . Eq. 4where ⌘ is a learning rate
9: Repeat the process for the new state until it converges
is formally defined as:
Q⇡
(s, a) = E⇡
{1X
k=0
�krt+k+1}, (3)
where E⇡
{.} is the expectation function under policy ⇡. Whenan appropriate policy is found, the RL problem at hand issolved. Q-learning is a technique that does not require anyspecific policy in order to evaluate Q(s, a), therefore:
Q(st
, at
) Q(st
, at
)+⌘[rt+1+�max
a
Q(st+1, a)�Q(s
t
, at
)],
(4)where � is the learning rate. In this case, the policy adaptationcan be achieved by selecting a random action with probability ✏and an action that maximizes the Q function in the current statewith probability 1� ✏, note that the value of ✏ is determinedby the exploitation/exploration strategy (cf. V-A):
a(s) = argmax
k
Q(s, k) (5)
The fuzzy Q-learning algorithm is summarized in Algorithm1. In the case of our running example, the state space is finite(i.e., 9 states as the full combination of 3 ⇥ 3 membershipfunctions for fuzzy variables w and rt) and our controllerhas to choose a scaling action among 5 possible actions{�2,�1, 0,+1,+2}. However, the design methodology thatwe describe is general. Note that the convergence is detectedwhen the change in the consequent functions is negligible ineach learning loop.
D. FQL4KE for Dynamic Resource Allocation
In this work, for simplicity, only one instance of the fuzzycontroller is integrated. Note that in the case of multiplecontrollers is also possible but due to the intricacies of updatinga central Q table, we consider this as a natural future extensionof this work for the problem areas that requires coordinationbetween several controllers, see [9].
Reward function. The controller receives the current valuesof w and rt that correspond to the state of the system, s(t) (cf.Step 4 in Algorithm 1). The control signal sa represents theaction a that the controller take at each loop. We define thereward signal r(t) based on three criteria: (i) numbers of thedesired response time violations, (ii) the amount of resourceacquired, and (iii) throughput, as follows:
r(t) = U(t)� U(t� 1), (6)
where U(t) is the utility value of the system at time t. Hence,if a controlling action leads to an increased utility, it meansthat the action is appropriate. Otherwise, if the reward is closeto zero, it implies that the action is not effective. A negativereward (punishment) warns that the situation becomes worseafter taking the action. The utility function is defined as:
U(t) = w1 ·th(t)
thmax
+w2 ·(1�vm(t)
vmmax
)+w3 ·(1�H(t)) (7)
H(t) =
8><
>:
(rt(t)�rtdes)rtdes
rtdes
rt(t) 2 · rtdes
1 rt(t) � 2 · rtdes
0 rt(t) rtdes
where th(t), vm(t) and rt(t) are throughput, number of workerroles and response time of the system, respectively. w1,w2 andw3 are their corresponding weights determining their relativeimportance in the utility function. In order to aggregate theindividual criteria, we normalized them depending on whetherthey should be maximized or minimized.
Knowledge base update. FQL4KE starts with controllingthe allocation of resources with no a priori knowledge. Afterenough explorations, the consequents of the rules can bedetermined by selecting actions that correspond to the highestq-value in each row of the Q-table. Although FQL4KE doesnot rely on design-time knowledge, if even partial knowledgeis available (i.e., operator of the system is confident withproviding some of the elasticity policies) or there exists dataregarding performance of the application, FQL4KE can exploitsuch knowledge by initializing q-values (cf. step 1 in Algorithm1) This implies a quicker learning convergence.
IV. IMPLEMENTATION
We implemented prototypes of FQL4KE on Microsoft Azureand OpenStack. As illustrated in Figure 4, the Azure prototypecomprises of 3 components integrated according to Figure 6:
i A learning component FQL implemented in Matlab. 1
1code is available at https://github.com/pooyanjamshidi/Fuzzy-Q-Learning
Code: https://github.com/pooyanjamshidi/Fuzzy-Q-Learning
RobusT2Scale
Learnedrules
FQL
Monitoring Actuator
CloudPlatform
.fis
LW
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡,𝑡ℎ, 𝑣𝑚
𝑠𝑎
LoadGenerator
C
systemstate
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟
FQL4KE:ImplementationArchitecture
CloudPlatform(PaaS)On-Premise
P:WorkerRole
L:WebRole
P:WorkerRole
P:WorkerRole
Cache
M:WorkerRole
Results:Storage
Blackboard:Storage
LG:Console
Auto-scalingLogic(controller)
Policy Enforcer
1 2 3
78
1112
4
109
LB:LoadBalancer
6
5Queue
Actuator
Monitoring
Code: https://github.com/pooyanjamshidi/ElasticBench
ElasticBench:TheExperimentalPlatform
LearningStrategies
0
0.2
0.4
0.6
0.8
1
1.2
0 8 15 23 33 42 53 61 75 87 95 105
118
127
135
147
159
169
179
190
199
210
217
223
236
245
255
265
271
279
289
298
305
317
S1 S2 S3 S4 S5learning epochs
prob
abilit
y
Q-valueEvolution
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300 350
q(9,3)
punishment
rewardno change
TemporalEvolutionofAcquiredNodes
0
1
2
3
4
5
6
7
8
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
high explorationless frequent exploration
mostly exploitation
experiment epochs
num
ber o
f VM
s
ControlSurfaceEvolution
1
2
3
4
ExperimentalResults
WorkloadInjectedtotheSystem
0 50 1000
500
1000
1500
0 50 100100
200
300
400
500
0 50 1000
1000
2000
0 50 1000
200
400
600
0 50 1000
500
1000
0 50 1000
500
1000
Big spike Dual phase Large variations
Quickly varying Slowly varying Steep tri phase
0
1000
2:00 6:47 11:34 16:20 21:07
user
requ
ests
time during 24 hours
0
1000
31-May-1998 14-Jun-1998 28-Jun-1998 12-Jul-1998
user
requ
ests
date during one and half months
user
requ
ests
Results
13
III. FQL4KE is capable of automatically constructing the control rules and keeping controlparametersupdatedthroughfastonline learning. Itexecutesresourceallocationand learnstoimproveitsperformancesimultaneously.
IV. Unlikesupervisedtechniquesthatlearnfromtrainingdata,FQL4KEdoesnotrequireoff-linetraining,whichsavessignificantamountsoftimeandeffort.
Table3.ComparisonoftheeffectivenessofFQL4KE,RobusT2ScaleandAzureauto-scalingunderdifferentworkloads.
Approach Criteria
WorkloadpatternsBig
spikeDualphase
LargeVariations
Quicklyvarying
Slowlyvarying
Steeptriphase
FQL4KE rt_95 1212ms 548ms 991ms 1319ms 512ms 561msvm 2.2 3.6 4.3 4.4 3.6 3.4
RobusT2Scale rt_95 1339ms 729ms 1233ms 1341ms 567ms 512ms vm 3.2 3.8 5.1 5.3 3.7 3.9Azureauto-scaling rt_95 1409ms 712ms 1341ms 1431ms 1101ms 1412ms
vm 3.3 4 5.5 5.4 3.7 4
5. Conclusions
We propose a new learning based self-adaptation framework, called MAPE-KE, which isparticularly suited for engineering elastic systems that need to cope with uncertainenvironments,suchascloudandbigdata,andneedtoberobustenoughbytakingthehumanoutoftheloop.WeimplementedtheFQL4KEcloudcontrolleraccordingtothisframeworktoadjusttherequiredresourceswhentheapplicationisrunningthataddressesthekeyproblemsofuncertainty,latencyandnoise.
Fuzzy rules are at the core of the proposed solutions. Our approach enables qualitativespecification of elasticity rules that allows the elasticity controller to handle even conflictingrules.Theproposedcontrollerisalsorobustagainstnoisymeasurementsthatmightarisefrompredictionaddedtothecontrollerloopormonitoringuncertainties.Theonlinelearningofrulesis a complementary solution that provides a completely user-independent cloud controllerdesign.
Whatemergesfromthediscussion istounderstandcloudcontrollersascontroltheory-basedtools. Our solutions use fuzzy control theory combinedwithmachine learning to specificallydealwithuncertaintyinitsvariousforms.
- FQL4KE performs better than Azure’s native auto-scaling service and better or similarly to RobusT2Scale - FQL4KE can learn to acquire resources for dynamic cloud systems. - FQL4KE is flexible enough to allow the operator to set different elasticity strategies.
RuntimeOverhead
Monitoring Learning Actuation
�104
0
2
4
6
8
10
12
Insight:MAPE-K->MAPE-KE
Monitoring
Analysis Planning
Execution
OfflineTraining
OnlineLearning
KnowledgeUpdate
Base-Level: Elastic SystemEnvironment
(Cloud, Sensors, Actuators)
Knowledge
Knowledge
Users
S A
Met
a-Le
vel:
MAP
E-K
Met
a-M
eta-
Leve
l: KE
Currentwork
- ImplementationonOpenStack withIntel,Ireland- PolicylearningthroughotherMLtechniques(e.g.,GP,BO)
Challenge 1: ~75% wasted capacityActual
demand
Challenge 2: customer lost
Fuzzifier
InferenceEng ine
DefuzzifierRulebase
FuzzyQ-learning
CloudApplicationMonitoring Actuator
CloudPlatform
FuzzyLog icController
KnowledgeLearning
Auton
omicCon
troller
𝑟𝑡𝑤
𝑤,𝑟𝑡,𝑡ℎ,𝑣𝑚
𝑠𝑎
systemstate systemgoal
RobusT2Scale
Learnedrules
FQL
Monitoring Actuator
CloudPlatform
.fis
LW
W
ElasticBench
𝑤, 𝑟𝑡
𝑤, 𝑟𝑡,𝑡ℎ, 𝑣𝑚
𝑠𝑎
LoadGenerator
C
systemstate
WCF
REST
𝛾, 𝜂, 𝜀, 𝑟
http://www.doc.ic.ac.uk/~pjamshid/PDF/qosa16.pdf
More Details?=> http://www.slideshare.net/pooyanjamshidi/
Slides?
=>
Thank you!
https://github.com/pooyanjamshidi
Code?=>
SubmittoCloudWays 2016!
Paper submission deadline July 1st, 2016Decision notification August 1st, 2016Final version due August 8th, 2016Workshop date September 5th, 2016
Collocated with ESOCC, ViennaTopics: Cloud Architecture, Big Data, DevOps
- Details: https://sites.google.com/site/cloudways2016/call-for-papers