distributed release management

46
Distributed Release Management Deploying etsy.com 40+ times per day Mike Brittain Engineering Director, Etsy @mikebrittain mikebrittain.com/talks

Upload: mike-brittain

Post on 08-May-2015

7.849 views

Category:

Technology


0 download

DESCRIPTION

Full Stack Engineering Meetup in NYC, May 27, 2014.

TRANSCRIPT

Page 1: Distributed Release Management

Distributed Release Management Deploying etsy.com 40+ times per day

Mike Brittain

Engineering Director, Etsy

@mikebrittain mikebrittain.com/talks

Page 2: Distributed Release Management

1st Day Assignment Put your face on etsy.com/about

Page 3: Distributed Release Management

What I’m showing you tonight is the result of four years of iteration.

Page 4: Distributed Release Management

Small incremental changes to the application “Dark” features: new classes, methods, controllers Graphics, stylesheets, templates Copy/content changes !

App deploys

Turning flags on, off, or % ramp up

Config deploys

Page 5: Distributed Release Management

Latent bugs and security holes Traffic management, load shedding Adding and removing infrastructure !

Tweaking config flags or releasing patches.

“Operating” the site

Page 6: Distributed Release Management

IRC, #push

Page 7: Distributed Release Management

/topic mbrittain | jgoulah | rsnyder | ekastner

Page 8: Distributed Release Management

/topic mbrittain, jgoulah, rsnyder | ekastner

Page 9: Distributed Release Management

Keep real people in the loop

Queue, with max batch size of seven.

Automated deployment run by humans

Page 10: Distributed Release Management

4 people in this deploy.

“I’ve pushed my changes to master.”

“Everyone has checked in.”

Page 11: Distributed Release Management

Build QA and Pre-prod

Build progress

Status in #push

Git SHA1 in for each env.

Date, username, deploy log, changeset, link to dashboard from time of deploy

Page 12: Distributed Release Management
Page 13: Distributed Release Management

Reporting what’s going on in Deployinator, and who triggered

Status from build cluster

Page 14: Distributed Release Management

Pre-prod (“princess”) has been deployed. !

SHA1 of the change Time it took to deploy Link to changeset in GitHub Log of the deploy script

Page 15: Distributed Release Management

Btw, there are three bots talking in channel at this point. O_o

Page 16: Distributed Release Management

Queuing for next deploy

Humans talk to other humans from time to time.

Page 17: Distributed Release Management

Talking to pushbot. !

Pushbot knows some Spanish… because, ya know, why not?

Page 18: Distributed Release Management

Link to test results for CI environment, along with how long the tests took.Alerting by name.

Page 19: Distributed Release Management

8 minutes have elapsed… We’ve built and tested our release in the CI environment (“QA”). !

QA build failed our 5 min. SLA for tests.

Page 20: Distributed Release Management

“Try” is our pre-commit testing cluster.

Page 21: Distributed Release Management

Bots help reinforce our values. This is especially helpful for new people on the team.

Page 22: Distributed Release Management
Page 23: Distributed Release Management

Still 8 minutes elapsed… Pre-prod has been deployed and tested. !

This ran in parallel with our QA build and tests.

Page 24: Distributed Release Management
Page 25: Distributed Release Management

Cross-traffic: In a separate channel (#config), our app configs files were deployed to pre-prod.

Page 26: Distributed Release Management
Page 27: Distributed Release Management
Page 28: Distributed Release Management
Page 29: Distributed Release Management
Page 30: Distributed Release Management

Cross-traffic: Ops team deployed a configuration change.

And, yes… another non-human.

Page 31: Distributed Release Management
Page 32: Distributed Release Management

Code is live Link to dashboard.

Page 33: Distributed Release Management
Page 34: Distributed Release Management

13 minutes elapsed… Code is now in production with public traffic.

Page 35: Distributed Release Management

Who committed code in the last deploy? And how many lines did each of them change?

Page 36: Distributed Release Management
Page 37: Distributed Release Management
Page 38: Distributed Release Management

Handoff for the next deploy.

Page 39: Distributed Release Management

Entire app deploy took 15 minutes. !

4 people running the deployment 8 committers Config deploy and Chef change deployed in parallel.

Page 40: Distributed Release Management

Optimal queue size

Normalized communication

Improved visibility

Historical record is ideal for post-mortems

Organic evolution

Page 41: Distributed Release Management

Hold up the queue (.hold)

Work the issue with the people available in #push

Additional help always available in #sysops

Buddy-system for off-hours deploys

Ops-on-call, dev-on-call

When something goes wrong?

Page 42: Distributed Release Management

25 Million Items listed 60+ Million Monthly unique visitors 200 Countries with annual transactions !

175+ Committers, everyone deploys

Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet

Page 43: Distributed Release Management

@mikebrittain

DEPLOYMENTS PER DAYAPP CODE CONFIG FILES

Page 44: Distributed Release Management

Start small. (We did.)

Automated tests and production monitoring.

Have a story around maintaining quality.

“We can always go back to the old way.”

Demonstrate value to leadership.

Page 45: Distributed Release Management

Go write your own story.

Page 46: Distributed Release Management

Thank you.

Mike Brittain

Engineering Director, Etsy

@mikebrittain mikebrittain.com/talks