puppet camp sydney feb 2014 - a build engineering team’s journey of infrastructure as code
DESCRIPTION
A Build Engineering Team’s Journey of Infrastructure as Code - the challenges that we’ve faced and the practices that we implemented as we went along our journey.TRANSCRIPT
Monday, 10 February 14
@peterleschev
Husband, Father of 3 & Atlassian
Build Engineering Team Lead
Peter Leschev
Monday, 10 February 14
A Build Engineering Team’s Journey of
Infrastructure as Code
Monday, 10 February 14
• Build platform & services used internally within the company• 60k builds per month• 35k automated tests for JIRA
Build Engineering today @ Atlassian
Monday, 10 February 14
• 600 build agents (own hardware + EC2 instances)• include SCM clients, JDKs, JVM build tools, databases, headless
browser testing, python builds, NodeJS, installers & more
• Maintain 20 AMIs of various build configurations• 6 Bamboo Servers• maven.atlassian.com / 6 Nexus instances • Monitoring - opsview / graphite / statsd
Build Engineering today @ Atlassian
Monday, 10 February 14
Infrastructure as Code
= Puppet + SCM ?
Monday, 10 February 14
• Manually maintained snowflakes• Started using puppet
3 years ago...
Monday, 10 February 14
Production rollout
puppetmaster
build agents
Monday, 10 February 14
Production rollout failure
puppetmaster
build agents
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence of Change
Dev Rollout Soak in Prod
Monday, 10 February 14
Monday, 10 February 14
http://atlassian.com/git
https://bitbucket.org/
Monday, 10 February 14
Style in Pull Requests
Monday, 10 February 14
• Automated style checking• Setup automated build that runs checks & posts results• Still need to implement a ratchet build
Puppet Lint https://github.com/rodjek/puppet-lintTim Sharpe
@rodjek
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence of Change
Dev Code review Rollout Soak in Prod
initial + Code review
Monday, 10 February 14
• Coding on Puppet Master• Culture of manually modifying
production - Configuration Drift
• Impact on Builds
Using Staging for Development
puppetmaster
build agentsbuild agents
staging puppet environment
Monday, 10 February 14
• Easily spin up Infrastructure locally on your laptop• Disposable / reproducible environments• Machine provisioning via Virtual Box / VMWare / AWS• Configuration applied via Shell Scripts / Puppet / Chef• Develop and test infrastructure changes locally
Vagrant http://www.vagrantup.com/Mitchell Hashimoto
@mitchellh
Monday, 10 February 14
Vagrant
Vagrantfile
vagrant basebox
http://www.vagrantup.com/Mitchell Hashimoto
@mitchellh
Monday, 10 February 14
Vagrant
Spins up a local VM to a known state
Destroy the VM when done
Make some puppet changes and then run:
to apply your changes
SSH into your VM using:
to check your changes
http://www.vagrantup.com/Mitchell Hashimoto
@mitchellh
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence of Change
Dev Code review Rollout Soak in Prod
initial + Code review + Vagrant
Monday, 10 February 14
• Vagrant basebox differences with production machines• Originally using publicly available vagrant baseboxes
• Installed packages biggest differences
• Generating a basebox manually was a painful process
Vagrant != Production
Monday, 10 February 14
VeeweeAutomated
Vagrant basebox generationhttps://github.com/jedi4ever/veewee
Patrick Debois@patrickdebois
Ubuntu installation iso vagrant baseboxVeewee definitions.rbpreseed.cfgpostinstall.sh
+
Monday, 10 February 14
Veeweehttps://github.com/jedi4ever/veewee
AutomatedVagrant basebox generation
Patrick Debois@patrickdebois
Monday, 10 February 14
• Latest basebox generated in CI & published to fileshare• No need to generate baseboxes locally
Basebox generation via CI
Monday, 10 February 14
• VirtualBox Guest additions• Reduced to a minimal
There are still differences!
Monday, 10 February 14
Common Preseed / Postinstall
preseed.cfg postinstall.sh
+
custom ISOsvagrant basebox PXEBoot
Monday, 10 February 14
Packer http://packer.ioMitchell Hashimoto
@mitchellh
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review Rollout Soak in Prod
initial + Code review + Vagrant + Veewee
Monday, 10 February 14
Developing locally
Rolling out to production
Broken build agents!
Rolling out to staging
Monday, 10 February 14
• Behaviour Driven Development
Cucumber
Monday, 10 February 14
Cucumber & Vagrant
Vagrant
Custom Provisioner
Virtual Box
VM
puppet apply
cucumber *.features
via ssh
Monday, 10 February 14
• Requires cucumber dependencies to be installed on tested VM
• Tests run within the VM making testing firewall rules harder
Disadvantages
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review Rollout Soak in Prod
initial + Code review + Vagrant + Veewee + Cukes
Monday, 10 February 14
But it works on my machine!– Every Developer”“
Monday, 10 February 14
• ‘From scratch’ provisioning• Confidence that you can rebuild in disaster
Continuous Integration
Monday, 10 February 14
The Pets: you give nice names,
you stroke them, and when they get ill,
you nurse them back to health,
taking a long time over it
– Tim Bell, CERN”
“
The Cattle: you give them numbers.
When they get ill, you shoot them
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review CI & Rollout Soak in Prod
initial + Code review + Vagrant + Veewee + Cukes + CI
Monday, 10 February 14
Provisioning from scratch is slow
Monday, 10 February 14
Spread out CI
provision VM1
provision VM2
provision VM3
provision VM4
provision VM1
provision VM2 provision VM3
provision VM4Moved from sequentialto parallel provisioning
Monday, 10 February 14
There are so many MacPros you can steal
Monday, 10 February 14
The onesI have my eye on....
Monday, 10 February 14
Profiling Puppet Runs
Add “--evaltrace” to puppet apply
+ =Collect and show the longest occurrences of:“Evaluated in ([\d\.]+) seconds”
Monday, 10 February 14
Profiling Cucumber runs
http://itshouldbeuseful.wordpress.com/2010/11/10/find-your-slowest-running-cucumber-features/
Monday, 10 February 14
• Provision locally & for CI• Faster & different class of problems found• Matches production state
Delta Provisioning
‘from scratch’ provision delta provision
provision VM1
export VM1 fileshare
import VM1 box
provision VM1
on success
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review CI & Rollout Soak in Prod
initial + Code review + Vagrant + Veewee + Cukes+ CI + Delta CI
Monday, 10 February 14
Infrequent Releases
Monday, 10 February 14
• Puppet runs impacted running builds• Disabling all the build agents
• Performing the roll out
• git clone / librarian-puppet / symlink update on puppetmaster
• Manually kick off puppet on all the build agents
• Enabling all the build agents
• Set of Puppet environments for every bamboo server
Painful Puppet Rollouts
Monday, 10 February 14
Graceful Service restarts
+Bamboo Agent JVM process watches for touch file & shutdowns when Idle(written as a Bamboo Plugin)
Monday, 10 February 14
• BEFORE - Multiple puppet envs for each Bamboo Server• jbac_staging
• jbac_production
• cbac_staging
• cbac_production
• etc
• AFTER - Changed to use ‘staging’ & ‘production’ only
Puppet Environments
Monday, 10 February 14
• BEFORE: Manually on puppetmaster• git clone the puppet tree
• run librarian-puppet to pull external modules
• Update staging / production symlink
• AFTER: Bamboo build which performs the above steps automatically
Updates on Puppetmaster
Monday, 10 February 14
Less Human interaction +
More automation=
Higher Confidence
Monday, 10 February 14
Less Human Effort =
Increased frequency of releases
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review CI & Rollout Soak in Prod
initial + Code review + Vagrant + Veewee+ Cukes + CI + Delta CI + Frequent releases
Monday, 10 February 14
Should I be scared?– Peter Leschev, 3 months ago”“
I’m scared!– Peter Leschev, 3 years ago”“
Monday, 10 February 14
Hipchat integration
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review CI & Rollout Soak in Prod
initial + Code review + Vagrant + Veewee+ Cukes + CI + Delta CI + Frequent releases+ Notification
Monday, 10 February 14
HIGH
NONE
Lifecycle of an infra change
Confidence in Change
Dev Code review CI & Rollout Soak in Prod
before after
Monday, 10 February 14
Confidence in Change
or
Finding & fixing problems sooner rather
than later
Monday, 10 February 14
Commit Graph
Monday, 10 February 14
Snowflakes
Pets
Cattle
Stateless Machines
Monday, 10 February 14
We’re still on the Journey
Come join us!
atlassian.com/jobs
Monday, 10 February 14
Questions?
Monday, 10 February 14
Thank you!
Monday, 10 February 14
Monday, 10 February 14