qcon london 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: fortune’s...

35

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations
Page 2: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Building resilienceHow outages shaped Etsy’s systems

Page 3: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Act 1

Page 4: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/niaid/11854196633/sizes/l/

Page 5: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

• Actually, it’s a slow process

• Iterative

• Introspective

• Horizontal and vertical development

Page 6: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

Page 7: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/studio360/1150744342/sizes/o/

Page 8: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/studio360/1150744368/sizes/o/

Page 9: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

Page 10: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

Current generation

Next generation

Page 11: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://www.flickr.com/photos/jurvetson/8671257096/

Page 12: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Quick! Be resilient!

http://cudebi.wordpress.com/2012/09/19/tah-pagh-tahbe-o-el-reconocimiento-de-william-shakespeare-en-el-universo-de-star-trek/

Page 13: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Resilience Engineering

http://www.flickr.com/photos/freefoto/728651045/sizes/o/

Page 14: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Resilience Engineering

• “To Engineer is Human”“To Forgive Design” - Henry Petroski

• “The Field Guide to Understanding Human Error” “Just Culture” - Sidney Dekker

Page 15: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Act 2

Page 16: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Building resilience at Etsy

• Continuous deployment

• Metrics, metrics, metrics

• Peer review

• Postmortems

Page 17: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Building resilience at Etsy

• Continuous deployment

• Metrics, metrics, metrics

• Peer review

• Postmortems }Culture

Page 18: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Or: How to win at failing

Postmortems

Page 19: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

• No blame

• Open discussion

• Focus on improvements

Constructive cultures

Page 20: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

• No blame

• Open discussion

• Focus on improvements}Culture

Constructive cultures

Page 21: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

–Japanese proverb

“The nail that sticks up,gets hammered down”

Destructive cultures

Page 22: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

The result?

Page 23: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

• #23: Fortune’s “Top 50 best small and medium businesses to work for”

• Rapid code iterations and deploys

• Lasting relationships

• Generousity of spirit

• …and much more

Page 24: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Act 3

Page 25: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Doing postmortems? Get Morgue

http://github.com/etsy/morgue

Page 26: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Morgue

Page 27: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Morgue

Page 28: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Morgue

Page 29: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Forkistan

• Mean time to detect: 0 min

• Mean time to recover: 10 mins

Page 30: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Yo Dawg, I Heard You Like Errors..

• Mean time to detect: 2 mins

• Mean time to recover: 15 mins

Page 31: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Smashing INT for Fun and Profit

• Mean time to detect: 0 min

• Mean time to recover: 4 hrs 52 mins

Page 32: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Apache Amnesia

• Mean time to detect: 2 hours

• Mean time to recover: 5 mins

Page 33: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Continuously Upgrading Databases

• Mean time to detect: 2 mins

• Mean time to recover: 1 hour (but, not really..)

Page 34: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations

Q & A

Avleen Vig Staff Operations Engineer Etsy, Inc @avleen

Page 35: QCon London 2014 › london-2014 › dl › qcon-london-2014 › slide… · •23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations