ljc 05/14 "cloud developer's dharma"
DESCRIPTION
Building applications for the IaaS Cloud is easy, right? "Sure, no problem - just lift and shift!" all the Cloud vendors shout in unison. However, the reality of building and deploying Cloud applications can often be different. This talk will introduce lessons learnt from the trenches during two years of designing and implementing cloud-based Java applications, which we have codified into our Cloud developer’s 'DHARMA' rules; Documented (just enough); Highly cohesive/loosely coupled (all the way down); Automated from code commit to cloud; Resource aware; Monitored thoroughly; and Antifragile. We will look at these lessons from both a theoretic and practical perspective using a real-world case study from Instant Access Technologies (IAT) Ltd. IAT recently evolved their epoints.com(http://epoints.com/) customer loyalty platform from a monolithic Java application deployed into a data centre on a 'big bang' schedule, to a platform of loosely-coupled JVM-based components, all being continuously deployed into the AWS IaaS CloudTRANSCRIPT
Cloud Developer’s DHARMA…redefining ‘done’ for Cloud
applications
Daniel BryantCTO, Instant Access Technologies
[email protected]@danielbryantuk
11/04/2023 @danielbryantuk
Who is this guy?
• My career so far…
• Open source work
• I enjoy coding…
11/04/2023 @danielbryantuk
11/04/2023 @danielbryantuk
epoints.com 2012/13 Upgrade…
• Increasing traffic – Scalability being stretched
• Increasingly diverse requirements
• Our starting point– Developers creating monolithic application– Manual QA– Operations deploying to data center
11/04/2023 @danielbryantuk
Core Changes…
• Service-Oriented Architecture
• Cloud-based deployments
• DevOps Culture
11/04/2023 @danielbryantuk
Core Changes…
• Service-Oriented Architecture– Twitter’s Story (bit.ly/1j1WbmI)
• Cloud-based deployments– Tonight!
• DevOps Culture– Previous LJC Event (bit.ly/1elVPJz)
11/04/2023
Moving to the Cloud
• IAT chose Amazon Web Services (AWS) IaaS
• IaaS was great, but…– Made a few mistakes – Learnt a lot of lessons
• Bonus! Russ Miles view on PaaS– bit.ly/1neXzaf
@danielbryantuk
11/04/2023 @danielbryantuk
Common Cloud Problems
TL;DR…
11/04/2023 @danielbryantuk
11/04/2023 @danielbryantuk
Common Cloud Problems
• Components– Who does what now?– Configuration issues
• Deployment topology– Cloud networking– “Is that DB local?”
11/04/2023 @danielbryantuk
Common Cloud Problems
• Unknown failure modes
– Bang!
– Now you see me…
– Can…you…hear…me?…
– A to B, but not B to A
11/04/2023 @danielbryantuk
Common Cloud Problems
• High Availability (HA)– Self-inflicted wounds…
• Snowflake servers
• Monitoring / Diagnostics
11/04/2023 @danielbryantuk
Common Cloud Problems
• Not respecting the underlying hardware
• Not testing in the Cloud– Here be dragons!
11/04/2023 @danielbryantuk
We’ve created a “Cloud Developer’s DHARMA” to act as a checklist when building Cloud apps
11/04/2023 @danielbryantuk
dharma/ˈdɑːmə,ˈdəːmə/
noun
1. Signifies behaviors that are considered to be in accord with order that makes life and universe possible (Hinduism)
2. "cosmic law and order”, but is also applied to the teachings of the Buddha (Buddhism)
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
11/04/2023 @danielbryantuk11/04/2023 @danielbryantuk
Documentation (just enough)
• Create a Wiki
• Simon Brown’s C4 Model– bit.ly/StVpa4
• Architectural ‘views’
• Open Source style README.md
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
High Cohesion / Loose Coupling (all the way down…)
• Code
• Architecture– Components– Services
• Public API– PayPal (bit.ly/1hnZNly)
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
Automated from Commit to Cloud
• Continuous Integration
• Continuous Deployment
• Continuous Delivery
11/04/2023 @danielbryantuk
Our Build Pipeline
Jenkins, with plugins…
• Build Pipeline– wiki.jenkins-ci.org/display/JENKINS/Build+Pipeline+Plugin
• Parameterized build– wiki.jenkins-ci.org/display/JENKINS/Parameterized+Build
• Promoted Builds Plugin– wiki.jenkins-ci.org/display/JENKINS/Promoted+Builds+Plugin
11/04/2023 @danielbryantuk
Our Build Pipeline
• Component Build– Compile– Unit Tests (surefire)– Integration Tests (failsafe)
• Deployment onto QA Cloud– Python Scripts + Chef to provision– Verify success using Python
11/04/2023 @danielbryantuk
Our Build Pipeline
• Acceptance Tests– Cucumber and Selenium– Work in progress…
• Performance Tests– Jmeter– Jenkins Jmeter performance plugin
• Live Deployment– “Human-based conditional operation”
11/04/2023 @danielbryantuk
Automating QA
• Intra-component integration testing– Spock is awesome (code.google.com/p/spock)– Utilise embedded datastore/middleware
• Inter-component integration testing– The hardest part of SOA…
• Consumer-based Testing– Brandon Byars (bit.ly/1lmcoaD)
11/04/2023 @danielbryantuk
Infrastructure: Say No To Snowflakes!
• Automate all provisioning– Chef, Puppet, SaltStack– Bash, Python– AWS API / CLI
• “Infrastructure as Code”– Version control everything
11/04/2023 @danielbryantuk
Infrastructure: Say No To Snowflakes!
• Doing “Proper Development”– Gareth Rushgrove at Craft Conf (bit.ly/1njuc49)– Chef Conf (www.youtube.com/user/getchef)
• Local tooling/testing– Vagrant (www.vagrantup.com)– Docker (www.docker.io)
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
Deployment Platform: What you’ve got…
11/04/2023 @danielbryantuk
What you think you want…
11/04/2023 @danielbryantuk
What you get…
Fact: 9 out of 10 cheetahs prefer the taste of an Ops team over tinned food
11/04/2023 @danielbryantuk
Thou Shalt Know thy Cloud…
• AWS EBS 100 IOPS (by default)– My Mac SSD does 49K IOPS
• 1000Mbps network max transfer ~125MB/s– My Mac does 400+ MB/s Sequential Write to SSD
• “Noisy [virtual] Neighbours”
Reference for Mac statistics: bit.ly/1ftJZH8
11/04/2023 @danielbryantuk
Thinking/Acting Operationally
• Cultivate “Mechanical Sympathy”
• Virtualisation – Tech Target (bit.ly/1kDVqyG)
• Networking– ‘Unix and Linux System Administration Handbook’– aws.amazon.com/documentation
11/04/2023 @danielbryantuk
Thinking/Acting Operationally
• Learn Linux fundamentals
• Diagnostic skills– top, netstat, vmstat, tcpdump– Java utils: jps, jstat, jmap, jhat– “DevOps Troubleshooting” by K. Rankin
• Maybe grow a beard…
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
Monitor All The Things!
• Infrastructure monitoring– Nagios– Zabbix– Splunk– AppDynamics
• Distributed Tracing – twitter.github.io/zipkin
11/04/2023 @danielbryantuk
Component Metrics
• Dropwizard’s Metrics– metrics.codahale.com
• Netflix’s Servo– github.com/Netflix/servo
• Etsy’s StatsD– github.com/etsy/statsd/wiki
11/04/2023 @danielbryantuk
Gauges, Counters, Meters, Timers…
11/04/2023 @danielbryantuk
Graph It!
11/04/2023 @danielbryantuk
11/04/2023 @danielbryantuk
11/04/2023 @danielbryantuk
Health Checks
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
11/04/2023 @danielbryantuk
Antifragile
• The opposite of fragile?– Robust…– Antifragile…
• Netflix are best-in-class– bit.ly/1gs5n3q
• System must be robust first!
11/04/2023 @danielbryantuk
Design for Failure
• Design patterns– Timeouts / retries– Bulkheads / circuit-breakers
• Inspiration– Chris Richardson (slidesha.re/1ft3vsg)– Netflix (bit.ly/1h5GMid)
11/04/2023 @danielbryantuk
Retries
https://github.com/rholder/guava-retrying
11/04/2023 @danielbryantuk
Circuit-breaker
https://github.com/Netflix/Hystrix
11/04/2023 @danielbryantuk
Robust in the Cloud
• Distributed Computing Principles– ‘For young bloods’ (bit.ly/1pKVepz)
• “Multi-AZ”
11/04/2023 @danielbryantuk
Robust in the Cloud
11/04/2023 @danielbryantuk
Robust in the Cloud
• Careful caching…
• Make apps “Cluster-aware”– MongoDB– SolrCloud– RabbitMQ
11/04/2023 @danielbryantuk
Real Antifragility
• Autoscaling
11/04/2023 @danielbryantuk
Antifragile Patterns
• Stateless components• Distributed data stores / caches
11/04/2023 @danielbryantuk
Antifragile Patterns
• Eventual consistency• Asynchronous communication
http://cloudshankar.blogspot.co.uk/2013/05/eventual-consistency.html
11/04/2023 @danielbryantuk
Documented (just enough)
Highly cohesive/loosely coupled (all the way down)
Automated from commit to Cloud
Resource aware
Monitored thoroughly
Antifragile
So, Cloud Apps are ‘done’ when…
11/04/2023 @danielbryantuk
Thanks For Listening
• Massive thanks to all the IAT team!
• Questions / comments?– [email protected]– @danielbryantuk
• Join us at Devoxx UK!– www.devoxx.co.uk