deploying on the cutting edge
TRANSCRIPT
SafelyDeploying on the cutting edgeEric Holscher
Urban Airship
Djangocon 2011
Wednesday, September 7, 2011
Talk Contents
• Company culture & process
• Deployment environment
• Tools for deploying
• Verifying deployment
Wednesday, September 7, 2011
Process
• Deploy out of git
• Standard git-based production/master branch model
• Production branch has releases are tagged with timestamp
• deploy-2011-08-03_14-37-38
• Feature branches
• http://nvie.com/posts/a-successful-git-branching-model/
Wednesday, September 7, 2011
Features
• Easily allows you to hot-fix production
• Keep a stable master
• Run CI on the master branch or long-lived feature branches
Wednesday, September 7, 2011
Services
• Everything that we deploy is conceptualized as a service
• Services all live in /mnt/services/<slug> (Thanks ec2)
• A service is an instance of a repository on a machine
• A repository might have multiple services
• eg. Airship deployed into “celery” and “web” services
• This maps really well onto Chef cookbooks
Wednesday, September 7, 2011
QA Environment
• Run all of your master branches
• Allow you to get a copy of what will become production
• Catch errors before they are seen by customers
• Spawn new ones for long-lived feature branches
• `host web-0` and figure out based on IP
Wednesday, September 7, 2011
Jump machine
• Have a standard place for all deployments to happen
• Log all commands run
Wednesday, September 7, 2011
No External Services
• Chishop
• No external server required to deploy code
• All branches are checked out on an admin server
Wednesday, September 7, 2011
Composable
• Small pieces that you can build into better things
• Useful when trying to do something you didn’t plan for
Wednesday, September 7, 2011
Environment
• Where code lands on the remote machine
• Mimics a chroot
• Uses virtualenv & supervisord
• Owned by the service-user
• Managed by Chef
Wednesday, September 7, 2011
File Structure
• /mnt/services/airship
• bin/
• current -> deploy-2011-08-03_14-37-38
• deploy-2011-08-03_14-37-38
• etc/
• var/
Wednesday, September 7, 2011
SCRIPT_DIR=$(dirname $0)SERVICE_DIR=$(cd $SCRIPT_DIR && cd ../ && pwd)
cd $SERVICE_DIRsupervisorctl pid > /dev/null 2>&1if [ "$?" != "0" ]; then echo "Supervisord not running, starting." supervisordelse echo "Supervisord running, starting all processes." supervisorctl start allficd - > /dev/null 2>&1
Wednesday, September 7, 2011
Bin scripts
• All of the process-level binscripts wrap supervisord
• bin/start -> supervisordctl start all
• bin/start foo -> supervisorctl start foo
• bin/stop -> supervisorctl stop all
• bin/stop shutdown -> supervisorctl shutdown
Wednesday, September 7, 2011
Init.d
• All services share a common init.d script
• This init.d script calls into the service’s bin/
• /etc/init.d/airship start -> /mnt/services/airship/bin/start
Wednesday, September 7, 2011
SERVICE_USER='<%= @service %>'SERVICE_NAME='<%= @service %>'SERVICE_PATH=/mnt/services/$SERVICE_NAMEset -eRET_CODE=0case "$1" in start) sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/start RET_CODE=$? ;; stop) sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/stop RET_CODE=$? ;; restart) sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/restart RET_CODE=$? ;; status) sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/status RET_CODE=$? ;; *) echo "$SERVICE_NAME service usage: $0 {start|stop|restart|status}" ;;esac
exit $RET_CODE
Wednesday, September 7, 2011
Low-level verbs
• pull
• build
• tag
• sync
• install
• rollback
• start/stop/restart/reload
Wednesday, September 7, 2011
Pull
• Update the code from the source repository
• Defaults to the “production” branch
• def pull(repo=None, ref='origin/production')
• Can pass in a specific revision/branch/tag/hashish
• local('git reset --hard %s' % ref, capture=False)
Wednesday, September 7, 2011
Build
• Could be called “prepare”
• Do local-specific things to get repo into a ready state
• Mostly used for compiling in java-land
• Useful in Python for running pre-install tasks
Wednesday, September 7, 2011
Tag
• Set a tag for the deploy in the git repo
• If the current commit already has a tag, use that instead
• git tag --contains HEAD
• deploy-2011-08-03_14-37-38
• strftime('%Y-%m-%d_%H-%M-%S')
Wednesday, September 7, 2011
Sync
• Move the code from the local to the remote box
• Uses rsync to put it into the remote service directory
• Also places a copy of the synced code on the admin box
Wednesday, September 7, 2011
Install
• Make the code the active path for code on the machine
• This is generally installing code into a virtualenv
• Updating the “current” symlink in the service directory
• Symlink Django settings file based on environment
Wednesday, September 7, 2011
Rollback
• When you break things, you need to undo quickly
• Reset the repository to the previous deployed tag
• git tag | grep deploy| sort -nr |head -2 |tail -1
• Deploy that
• Very few moving pieces
Wednesday, September 7, 2011
Start/Stop/Reload
• Allow you to bounce services as part of deployment
• Allow reload for services that support it
Wednesday, September 7, 2011
CLI UI
• Have nice wrapper commands that do common tasks
• deploy host:web-0 full_deploy:airship
➡pull, build, tag, sync, install
• deploy host:web-1 deploy:airship
➡ tag, sync, install
• deploy host:web-2 sync:airship
➡sync
Wednesday, September 7, 2011
#!/bin/bash
cd ~/airdeployDATE=$(date +%Y_%-m_%-d-%H-%m-%s)echo "deploy" $@ > logs/$DATE.logfab $@cd - > /dev/null 2>&1
Wednesday, September 7, 2011
Meta-commands
• Hard-code the correct deployment behavior
• “Make easy things easy, and wrong things hard”
• Knows what machine each service is deployed to
• deploy airship
➡deploy pull:airship
➡deploy type:web deploy:airship
Wednesday, September 7, 2011
Magicifying
• Now that we have a solid base, we can automate on top
• When you do a meta deploy, it should be a “smart deploy”
Wednesday, September 7, 2011
Workflow
• Deploy to one web server, preferably with one worker
• Restart it
• Run it against heuristics to determine if it’s broken
• If it’s broken, rollback, otherwise continue on
Wednesday, September 7, 2011
Heuristics
• Any 500s
• Number of 200s to non-200s
• Number of 500s to 200s
• Requests a second
• Response time
• $$$ (Business metrics)
Wednesday, September 7, 2011
How it works
• Tell load balancer to take machine out of pool
• /take_me_out_of_the_lb -> 200
• Start your code with 1 worker and a different port
• supervisorctl start canary
• Expose metrics from your services over json
• Make sure your load balancer weights it appropriately
• Poll your metrics for X time before considering it functional
Wednesday, September 7, 2011
Questions?
• Eric Holscher
• Urban Airship (Hiring and whatnot)
Wednesday, September 7, 2011