does16 san francisco - david blank-edelman - lessons learned from a parallel universe
TRANSCRIPT
![Page 1: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/1.jpg)
Lessons
Learned
from a
Parallel Universe
David N. Blank-Edelman
@otterbook
photo: Will Langenberg
![Page 2: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/2.jpg)
SRE for DevOps
David N. Blank-Edelman
@otterbook
![Page 3: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/3.jpg)
![Page 4: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/4.jpg)
![Page 5: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/5.jpg)
![Page 6: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/6.jpg)
![Page 7: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/7.jpg)
![Page 8: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/8.jpg)
![Page 9: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/9.jpg)
![Page 10: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/10.jpg)
![Page 11: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/11.jpg)
![Page 12: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/12.jpg)
![Page 13: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/13.jpg)
![Page 14: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/14.jpg)
![Page 15: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/15.jpg)
![Page 16: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/16.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems are blameless and focus on process and technology, not people
![Page 17: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/17.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems are blameless and focus on process and technology, not people
![Page 18: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/18.jpg)
SLO monitor decide
![Page 19: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/19.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems are blameless and focus on process and technology, not people
![Page 20: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/20.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems are blameless and focus on process and technology, not people
![Page 21: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/21.jpg)
Observation #1:
Create virtuous and reinforcing feedback loops
![Page 22: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/22.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems are blameless and focus on process and technology, not people
![Page 23: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/23.jpg)
What Makes SRE, SRE (dramatic recreation)• hire only coders• have an SLA for your service• measure and report performance against SLA• Use Error Budgets and gate launches on them• Common staffing pool for SRE and DEV• Excess Ops work overflows to DEV team• Cap SRE operational load at 50%• Share 5% of ops work with DEV team• Oncall teams at least 8 people, or 6x2• Maximum of 2 events per oncall shift• Post mortem for every event• Post mortems blameless and focus on process and technology, not people
![Page 24: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/24.jpg)
Observation #2:
You can’t fire your way to reliable.
![Page 25: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/25.jpg)
![Page 26: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/26.jpg)
•Airbnb•Amazon•Apple•Baidu•Dropbox• Etsy• Facebook•GitHub• LinkedIn
•Microsoft•Netflix•Pinterest• Spotify• Stack Exchange• Twitter•Uber• Yahoo!• Yelp
![Page 27: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/27.jpg)
•Airbnb•Amazon•Apple•Baidu•Dropbox• Etsy• Facebook•GitHub• LinkedIn
•Microsoft•Netflix•Pinterest• Spotify• Stack Exchange• Twitter•Uber• Yahoo!• Yelp
![Page 28: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/28.jpg)
![Page 29: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/29.jpg)
Facebook has Production Engineers
![Page 30: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/30.jpg)
![Page 31: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/31.jpg)
Facebook has Production Engineers“Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
![Page 32: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/32.jpg)
Facebook has Production Engineers“Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
![Page 33: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/33.jpg)
Facebook has Production Engineers“Production Engineers at Facebook are hybrid software/systems engineers who ensure that Facebook's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.”
![Page 34: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/34.jpg)
Observation #3:
There is no one right way to build an SRE team. Butthere are wrong ways.
![Page 35: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/35.jpg)
Facebook has Production Engineers•~1-10 ratio•18, 24, 36 months with a service
(not a S.W.A.T. team or ops monkeys)•Also do Bootcamp•Lead reports to Facebook head of engineering
(same person responsible for Ops and Eng)
![Page 36: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/36.jpg)
Observation #4:
SRE requires management support at the highest levels.
![Page 37: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/37.jpg)
Production Engineering Maturity Model•maturity of SW/PE team’s work•maturity of level of service• relationship status of PE/SWE teams• stages: • bootstrap/build out• scale/initial deployment• awesomize
![Page 38: DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel Universe](https://reader036.vdocument.in/reader036/viewer/2022062823/587a4cff1a28ab00148b6a93/html5/thumbnails/38.jpg)