![Page 1: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/1.jpg)
Availability Analysis for Deployment of In-Cloud
ApplicationsXiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li
Sherif Sakr, Hiroshi Wada, Ingo Weber
Software Systems Research Group, NICTA
ISARCS13, Vancouver
Slides at: http://www.slideshare.net/LimingZhu/
![Page 2: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/2.jpg)
NICTA Copyright 2010 From imagination to impact 2
Motivation
• Uncertainties in Cloud are challenging for architecting critical applications and understanding availability – Shared resources, weak SLA guarantees and limited visibility– Rare but high consequence events– Sporadic activities: upgrade, backup, recovery… – Subjective uncertainties: impact of configuration choices
• We want to explicitly model the above uncertainties in application availability analysis of cloud deployment.– from a cloud consumer perspective– focusing on mechanisms most relevant to critical
applications: auto-scaling, over-provisioning, backup, recovery and maintenance.
![Page 3: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/3.jpg)
NICTA Copyright 2010 From imagination to impact 3
Contributions
• SRN(Stochastic Reward Net)-based availability models • which allow you to specify:
– Deployment architecture (application placements in VM)– Node/Aggregation level SLAs from infrastructure providers– Auto-scaling policies and recovery strategies – Rare events: availability zone or region down
• which give you application availability levels of different options under different scenarios
• Model evaluation by analysing existing industry best practices in cloud application deployment– Quantifying the rule-of-thumb best practices– Comparing different (best) practices
![Page 4: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/4.jpg)
NICTA Copyright 2010 From imagination to impact 4
Deployment Architecture Assumption
– Stateless VMs: auto-scaling groups– Stateful VMs: hot standbys – Backup at separate region for recovery
![Page 5: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/5.jpg)
NICTA Copyright 2010 From imagination to impact 5
Availability Analysis Overview
• SRN-based Models• Architecture model and recovery model in this paper• One SRN architecture model per availability zone
![Page 6: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/6.jpg)
NICTA Copyright 2010 From imagination to impact 6
Availability Analysis Overview
• Deployment decisions and patterns – stateless/stateful application placement within VMs– auto-scaling policies– multi-zone configurations
![Page 7: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/7.jpg)
NICTA Copyright 2010 From imagination to impact 7
Availability Analysis Overview
• SLA from the cloud providers• Node level (Rackspace) or zone level (Amazon)
![Page 8: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/8.jpg)
NICTA Copyright 2010 From imagination to impact 8
Availability Analysis Overview
• Recovery strategy• Auto-regeneration of stateless VMs and different
recovery mechanisms for stateful VMs• Different Recovery-Time/Point-Objective (RTO/RPO)
![Page 9: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/9.jpg)
NICTA Copyright 2010 From imagination to impact 9
Availability Analysis Overview
• Application-specific data– Stateless VM start-up time… – Stateful VM replication…
![Page 10: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/10.jpg)
NICTA Copyright 2010 From imagination to impact 10
Stochastic Reward Net
• Stochastic Reward Net (SRN)– Stochastic Petri Net variant – Firing delays– Reward function
• Constructs• Places: VM states (Full,
Running, Stoped, Failed )• Token: VMs• Transition
• Guard function• Transition rate: 1) frequency of
events, 2) delay before the transition fires
• Reward Function: if((#Running1>0) 1 else 0
![Page 11: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/11.jpg)
NICTA Copyright 2010 From imagination to impact 11
SRN-based Availability Models
![Page 12: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/12.jpg)
NICTA Copyright 2010 From imagination to impact 12
Availability Models: Auto-scaling
![Page 13: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/13.jpg)
NICTA Copyright 2010 From imagination to impact 13
Availability Models: Auto-scaling
gScaleSelf1: if(#Running1<=#Running2 && #Stopped1>0) 1 else 0
gScaleOther1: if(#Running1>#Running2 && #Stopped2>0) 1 else 0
![Page 14: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/14.jpg)
NICTA Copyright 2010 From imagination to impact 14
Availability Models: Stateful VM
![Page 15: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/15.jpg)
NICTA Copyright 2010 From imagination to impact 15
Availability Models—Disaster Recovery
• Availability zone life cycle– Interact with the big
architecture model
• Stateless VM recovery– Backup/AMI
• Stateful VM recovery– Backup– Replica– Hot standby
![Page 16: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/16.jpg)
NICTA Copyright 2010 From imagination to impact 16
Case 1: Multi-zone Deployment• Parameters
– Amazon EC2 SLA of 99.95% availability – Zone fail rate: 0.00011, MTTR: 4.38 hours per year
– Application specific measurement of transitions
0.01% = 52.56 mins downtime per year
0.4% diff = 35 hours
0.76% diff = 66 hours
![Page 17: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/17.jpg)
NICTA Copyright 2010 From imagination to impact 17
Case 2: Recovery across Availability Zone
• Industry rule of thumb: “Target auto-scale 30-60% until you have 50% headroom for load spikes. Lose an AZ leads to 90% utilisation.”• Impact on overall availability?• 30-60% vs. traditional 70-90%?• over-provisioning vs. auto-scaling?
0.29% diff = 25 hours
![Page 18: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/18.jpg)
NICTA Copyright 2010 From imagination to impact 18
Case 3: Disaster Recovery across Regions
• Trade-off between RPO and RTO• RPO: Recovery Point Objective• RTO: Recovery Time Objective
Yuruware — http://www.yuruware.com/
0.2% diff = 17 hours
![Page 19: Availability Analysis for Deployment of In-Cloud Applications](https://reader033.vdocument.in/reader033/viewer/2022061223/54c6b58b4a79597d178b45f8/html5/thumbnails/19.jpg)
NICTA Copyright 2010 From imagination to impact
Conclusion and Future Work
• SRN-based availability models – Application-level availability – Highly configurable for different deployment architectures– Model different uncertainties and scenarios for critical systems– Quantify and compare choices and enable what-if analysis – Evaluated using industry best practices
• Future work – Better evaluation!– Integrated models on impact of upgrade, live migration, backup and
subjective uncertainties (in IEEE Cloud 13)Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application Deployment Decisions for Availability," in IEEE Cloud 2013
[email protected] available at http://www.slideshare.net/LimingZhu/
19