resisting)reliability)degradation) through)proactive...

Post on 20-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RESISTing  Reliability  Degradation  through  Proactive  ReconfigurationD.  Cooray,  S.  Malek,  R.  Roshandel,   and  D.  KilgoreSummarized by Haoliang Wang

September 28, 2015

MotivationAn  emerging  class  of  system  -­‐ Situated  Software  System◦ Predominantly  pervasive,  embedded  and  mobile◦ Software  system  is  subject  to  dynamical  contextual  changes◦ Most  applications  like  emergency  response  are  mission-­‐critical – Reliabilitymatters

Reliability  analysis  at  design-­‐time  is  insufficient◦ System  reliability  (and  other  QoS)  depends  on  its  runtime  characteristics◦ Adaptation  at  runtime  is  necessary

Adaptation  using  reactive  approach  ◦ Adapts  to  changes  after  degradation  – not  good  enough◦ Prediction-­‐based  proactive adaptation  is  preferred

Challenges§Proactively  re-­‐configure  the  system  before  performance  degradation

§Effectively  estimate  the  reliability  of  a  complex  system  at  runtime

§Determine  the  optimal  system  architecture  at  runtime

RESIST FrameworkResilient  Situated  Software  System◦ Component-­‐level  Reliability  Analyzer◦ Configuration  Reliability  Analyzer◦ Configuration  Selector

Context-­‐Aware  Middleware◦ Provides  support  for  execution,  monitoringand  adaptation  of  a  software  system

RESIST Framework (Cont. )RESIST  is  Goal  Management  layer  solution  in  the  three  layer  architectural  model  for  self-­‐managed  system

RESIST Framework (Cont. )System  Model◦ The  system  is  divided  into  several  functional  componentswhich  have  their  own  reliability◦ Each  component  is  allocated  to  a  process◦ The  system  reliability  is  determined  by  the  architecture,  the  individual  components,  and  the  context

Failure  Model◦ Fail-­‐stop  – detectable  by  middleware  facilities◦ Component  failureEffects  are  contained  within  the  boundary  of  component

◦ Process  failureOccurs  when  one  of  its  components  exits  prematurely.Other  components  running  on  it  will  also  fail

Component-level AnalysisDiscrete  Time  Markov  Chain  (DTMC)◦ Estimate  the  component  reliability  ◦ A  stochastic  process  with  a  set  of  states  S  =  {S1,  S2,  S3,  …,  SN}

◦ Transition  matrix  A  =  {aij},  where  aij is  the  probability  of  transitioning  from  Si to  Sj

◦ Reliability  of  the  component  is  computedby  solving  the  steady  state  probability  of  not  being  in  any  failure  state

How  to  derive  the  transition  matrix  A?

Component-level Analysis (Cont. )Hidden  Markov  Models  (HMMs)◦ Learn  from  the  runtime  data  and  estimate  the  transition  probability  matrix

◦ A  stochastic  process  with  a  set  of  states  S  =  {S1,  S2,  S3,  …,  SN}

◦ Transition  matrix  A  =  {aij},  where  aij is  the  probability  of  transitioning  from  Si to  Sj

◦ A  set  of  observations  O  =  {O1,  O2,  O3,  …,  OM}◦ Observation  matrix  E  =  {eik},  where  eik is  the  probability  of  observing  event  Ok in  state  Si

Baum-­‐Welch  algorithm  is  used  to  train  and  solve  the  HMM  and  obtain  the  converged  transition  matrix  A

Component-level Analysis (Cont. )An  example  for  estimating  component  reliability◦ A  robot  controller  behavior  model◦ States  S  =  {idle,  estimating,  planning,  moving,  failed}◦ Running  Baum-­‐Welch  algorithm  on  the  observation  sequence  and  we  can  obtain  the  transition  matrix  A

◦ Solve  for  the  steady  state  probability  vector[0.1966,  0.2238,  0.3849,  0.1914,  0.0033]

◦ Controller  component  reliability  is  1-­‐ 0.0033  =  99.67%

Component-level Analysis (Cont. )Estimate  the  near  future  by  incorporating  the  context◦ Define  a  set  of  contextual  parameters  C  =  {C1,  C2,  …,  Cx}◦ If  akj is  a  transition  probability  from  state  Sk to  state  Sj in  matrix  A which  is  affected  by  changes  in  a  specific  contextual  parameters  Cn,  then  

a’kj =  μ(akj,  ΔCn),  where  μ is  a  context-­‐specific  function  quantifying  the  impact  of  contextual  change  on  the  transition  probability.

◦ The  remaining  transition  probabilities  in  the  row  are  adjusted  proportionately  such  that:  a’kj +  akf +  Σa’km =  1.  

Configuration-level AnalysisMarkov-­‐based  system-­‐level  reliability estimation◦ System  reliability  is  estimated  compositionally  based  on  the  reliability  of  individual  components

◦ Map  the  components  and  the  interactions  between  them  into  a  DTMC,  where  a  state  is  one  or  more  components  in  concurrent  execution

◦ System  reliability  is  computed  as,

where  𝑀 is  a  𝑘×𝑘matrix  whose  elements  are,

where  𝑅% is  the  reliability  of  state  𝑠% and  𝐸 is  the  determinant  of  the  remaining  matrixexcluding  the  last  row  of  the  first  column  of  (𝐼 − 𝑀)

𝑅 = (−1)./0𝑅.𝐸

𝐼 −𝑀

𝑀 𝑖, 𝑗 = 4𝑅%𝑃%6  , 𝑠%  𝑟𝑒𝑎𝑐ℎ𝑒𝑠  𝑠6  𝑎𝑛𝑑  𝑖 ≠ 𝑘0  ,                                                            𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Configuration-level Analysis (Cont. )An  example  for  estimating  system  reliability◦ Suppose  we  obtain  the  initial  component  reliabilityfor  the  Controller and  Navigator  to  be

and  assume  others  are  100%  reliable◦ Based  on  the  observed  data,  we  can  obtain  thetransition  probability  for  each  state  and  therefore  M

◦ Solving  the  model  yields  a  system  reliability  of  93.85%

𝐶 = 0.9967,𝑁 = 0.9751

Configuration-level Analysis (Cont. )Impact  of  architectural  style◦ E.g.,  Replicating  components  to  improve  system  reliability

Configuration-level Analysis (Cont. )Impact  of  deployment  architecture◦ E.g.,  Reallocating  components to  different  processes  to  improve  system  reliability

Configuration SelectionConfiguration  selection  as  an  optimization  problem◦ The  optimal  configuration  in  RESIST  is  defined  as  one  that  satisfies  the  system’s  reliability  requirement,  while  improving  other  quality  attributes  of  concern

◦ In  other  words,  given  the  decision  variables,𝑝% ∈ 𝛧/ represents  the  number  of  replicas  for  component  𝑖𝑥%6 ∈ [0, 1] indicates  if  component  𝑖 is  placed  on  process  𝑗

the  objective  is  to  find  an  architectural  configuration  𝐶∗ such  that,

where  𝑈S is  a  utility  function  indicating  the  preference  for  quality  attribute  𝑞𝑅(𝐶) is  the  expected  reliability  of  a  given  architecture  𝐶

𝐶∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(W) X 𝑈S(𝐶)∀S  ∈  Z[\]%^_  `a6bc^%dbe

𝑠. 𝑡.          ∀𝑖 ∈ 1,… , 𝑡 , 𝑝% ≤ 𝑤%, 𝑤 ∈ 𝛧/∀𝑖 ∈ 1, … , 𝑡 , ∑ 𝑥%6i

6j0 = 1𝑅 𝐶 ≥ 𝛿, 𝛿  𝜖  ℝ, 0 < 𝛿 ≤ 1

Configuration Selection (Cont. )Configuration  reliability  R(C)◦ Assume  the  component  may  either  be  replicated  or  share  a  process  with  other  componentsExpress  with  a  binary  variable  𝑞% = 1 if  𝑖^i component  shares  a  process;  0  if  otherwise.

𝑞% = 1−X 𝑥%6p (1 − 𝑥.6)^

.q%

i

6j0◦ Thus,  the  effective  reliability  of  component  i is,

𝑟%rss = 𝑞%𝑟%tuvwr + (1 − 𝑞%)𝑟%wrywhere,

𝑟%tuvwr =X 𝑟%𝑥%6p [𝑟.𝑥%6 + (1 − 𝑥.6)]^

.q%

z

6j0

𝑟%wry = 1 − 1− 𝑟%0/{|

◦ Finally,  the  system  reliability  can  be  computed  as  specified  in  configuration-­‐level  analysis  

Configuration Selection (Cont. )Time-­‐complexity  analysis◦ Suppose  we  have

P =  number  of  processesC =  number  of  componentsN =  maximum  number  of  replicas

◦ This  implies  that  there𝑂(𝑃W) ways  of  allocating  components  to  processes𝑂(𝑁W) ways  replicating  components

◦ Therefore,  total  possible  configuration  is  𝑂((𝑁𝑃)W) – NP  Problem

However  the  solution  space  may  be  significantly  pruned  by  imposing  architectural  constrains  

EvaluationImplementation◦ Mobile  emergency  response  system  prototype◦ XTEAM  is  used  to  control  system’s  operational  profile◦ Prism-­‐XM  is  used  to  gather  the  runtime  data◦ Matlab is  used  to  generate  and  solve  HMM  model

Evaluation  Criteria◦ Validity  of  reliability  predictions◦ Effectiveness  of  proactive  re-­‐configuration◦ Performance  overhead

Evaluation (Cont. )Validity  of  Reliability  Prediction◦ Use  Bump  Probability  as  the  contextual  parameter  which  affect  the  transition  probability  from  moving state  to  estimating.

Evaluation (Cont. )Proactive  Reconfiguration

Evaluation (Cont. )Overhead  of  Component  Reliability  Analysis

SummaryRESIST  is  framework  that  maintain  the  reliability  of  the  situated  software  system  through  proactive  reconfiguration  of  the  software  architecture

Three  major  components◦ Component  reliability  analysis◦ Configuration  reliability  analysis◦ Configuration  selector

Three  key  contributions◦ Incorporation  of  multiple  sources  of  information,  particularly  contextual  information◦ Automatically  find  the  optimal  architectural  configuration◦ Proactively  adapt  the  system  before  the  system’s  reliability  degrades

top related