software operability and run book collaboration - devops summit, bangalore

63
#unidevops Software Operability, Run Book Collaboration, and DevOps Matthew Skelton 18th December 2013 DevOps Summit, Bangalore, India www.devops-summit.org @matthewpskelton softwareoperability.com

Upload: matthew-skelton

Post on 11-May-2015

1.202 views

Category:

Technology


0 download

DESCRIPTION

Making software work well in production (through good software operability) is one of the goals of DevOps. Collaboration between Dev and Ops on the 'run book' or operation manual is one way to open up communication channels between Dev and Ops, leading to improved software operability. This is the slide deck I used at DevOps Summit, Bangalore, on 18th December 2013.

TRANSCRIPT

Page 1: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software Operability,

Run Book Collaboration,

and DevOps

Matthew Skelton18th December 2013

DevOps Summit,

Bangalore, India

www.devops-summit.org

@matthewpskelton

softwareoperability.com

Page 2: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Agenda

• Software Operability

• Run Book Collaboration

• Making Operability Work

• Questions

Page 3: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Background

• Software systems since 1998

• Software build & deployment

specialist & DevOps enthusiast

• London Continuous Delivery

meetup group - londoncd.org.uk

• Experience DevOps workshops

Page 4: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software

Operability

Page 5: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software Operability

• Definitions

• Examples

• Why focus on operability?

• How DevOps can help

Page 6: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operability?

Page 7: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Etymology of Operability?

• Cognates:

– Opera

– Operate

– Operational

– Inter-operability

Page 8: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Page 9: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software Operability

• Operability: the properties of a

system which make it work well in

Production

Page 10: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operable Systems

Since 1929,

Mallorca, Spain

Page 11: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software Operability

• David Copeland (@davetron5000):

“How your software runs in

production is all that matters. The

most amazing abstractions, cleanest

code, or beautiful algorithms are

meaningless if your code doesn’t run

well on production.”

• http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html

Page 12: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operational Criteria

• Deploy

• Monitor

• Diagnose

• Debug

• Query

• Control

• Inspect

• Clear

• ...

Page 13: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

“Non-Functional”

Page 14: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Shaped by Operability

• Hooks (internal APIs) for:

– Logging

– Monitoring

– Diagnostics

– Health checks

– Data clear-down

– Service / daemon / container control

Page 15: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Ops Folk are Users Too!

Page 16: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Page 17: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Why focus on Operability?

• Deploy more rapidly, frequently

• High cost of Production outage

• Systems now more complicated

Page 18: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Outages are Embarrassing!

Page 19: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operational considerations

Page 20: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operational considerations

Page 21: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operational considerations

Page 22: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

How DevOps can help

• DevOps is one way to address poor operability

• Improved collaboration and communication between Dev teams and Ops teams

• Example: Run Book Collaboration

Page 23: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Run Book

Collaboration

Page 24: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Run Book Collaboration

• Feedback loops and learning

• What is a run book?

• How can run book collaboration

help operability?

Page 25: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Feedback Loops

Gene Kim:

http://itrevolution.com/the-three-ways-principles-underpinning-devops/

Page 26: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Run Book

Page 27: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Templates

Page 28: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Example

• 1 Table of Contents

• 2 System Overview – 2.1 Service Overview

– 2.2 Contributing Applications, Daemons, and Windows Services

– 2.3 Hours of Operation

– 2.4 Execution Design

– 2.5 Infrastructure and Network Design

– 2.6 Resilience, Fault Tolerance and High-Availability

– 2.7 Throttling and Partial Shutdown– 2.8 Required Resources

– 2.9 Expected Traffic and Load • 2.9.1 Hot or Peak Periods• 2.9.2 Warm Periods• 2.9.3 Cool or Quiet Periods

– 2.10 Environmental Differences

– 2.11 Tools

• 3 Security and Access Control

• 4 System Configuration – 4.1 Configuration Management

• 5 System Backup and Restore – 5.1 Backup Requirements

• 5.1.1 Special Files

– 5.2 Backup Procedures

– 5.3 Restore Procedures

• 6 Monitoring and Alerting – 6.1 Error Messages

– 6.2 Events

– 6.3 Health Checks

– 6.4 Other Messages

• 7 Operational Tasks – 7.1 Deployment

– 7.2 Batch Processing

– 7.3 Power Procedures

– 7.4 Routine Checks • 7.4.1 System Rebuilds

– 7.5 Troubleshooting

• 8 Maintenance Tasks – 8.1 Maintenance Procedures

• 8.1.1 Patching – 8.1.1.1 Normal Cycle

– 8.1.1.2 Zero-Day Vulnerabilities

• 8.1.2 GMT/BST time changes• 8.1.3 Cleardown Activities

– 8.1.3.1 Log Rotation

– 8.2 Testing • 8.2.1 Technical Testing• 8.2.2 Post-Deployment

• 9 Failure and Recovery Procedures – 9.1 Failover– 9.2 Recovery

– 9.3 Troubleshooting Failover and Recovery

• 10 Contact Details

Page 29: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Example

• 1 Table of Contents

• 2 System Overview – 2.1 Service Overview

– 2.2 Contributing Applications, Daemons, and Windows Services

– 2.3 Hours of Operation

– 2.4 Execution Design

– 2.5 Infrastructure and Network Design

– 2.6 Resilience, Fault Tolerance and High-Availability

– 2.7 Throttling and Partial Shutdown

– 2.8 Required Resources

– 2.9 Expected Traffic and Load

• 3 Security and Access Control

• 4 System Configuration

• 5 System Backup and Restore

• 6 Monitoring and Alerting

• 7 Operational Tasks

• 8 Maintenance Tasks

• 9 Failure and Recovery Procedures

• 10 Contact Details

Page 30: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Example

2.1 Service Overview

2.2 Contributing Applications, Daemons, and Windows Services

2.3 Hours of Operation

2.4 Execution Design

2.5 Infrastructure and Network Design

2.6 Resilience, Fault Tolerance and High-Availability

2.7 Throttling and Partial Shutdown

2.8 Required Resources

2.9 Expected Traffic and Load

Page 31: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

It’s Not Documentation

Page 32: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Focus on Collaboration

Page 33: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Outcomes

• Better understanding

• Better cross-team working

• Reduction in operational problems

• Fewer outages

• Reduced long-term cost-of-

ownership

Page 34: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Run Book as Collaboration

• Focus on the collaboration

• Run book is a means, not an end

• Throw it away when complete (?)

• Aim to automate more over time

• See http://runbookcollab.info/

Page 35: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Making Operability

Work

Page 36: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Making Operability Work

• NFRs vs Operational Features

• Budget changes

• Organisation changes

• Responsibility changes

• Avoid on-call anti-patterns

Page 37: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

“Non-Functional”

Page 38: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operational Features

Features

Page 39: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Taking Operability Seriously

• Single product backlog

– End-user + Operational features

– New features + bugs

• Product Owner on call

– Accountable for operational failures

– Seriously!

Page 40: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Page 41: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Budget changes

• “What is your budget code?”

• Capex vs. Opex?

• Remove budget barriers to

regular, effective communication

Page 42: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Niek Bartholomeus (@niekbartho) - http://niek.bartholomeus.be/https://speakerdeck.com/niekbartho/self-organization-vs-global-optimization-a-comparison-between-

traditional-and-modern-organizations

Page 43: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Organisation changes

• “I’ll need to ask my manager first”

• Lack of autonomy

• Remove reporting barriers to regular, effective communication

• More at http://bit.ly/DevOpsTopologies

Page 44: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

“I just want to write code”

Page 45: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Mysterious Coding Tricks

Page 46: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

On-call for Responsibility

Page 47: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

On-call Anti-Patterns

• Too much overtime pay

• Too little overtime pay

• Rota team too small

• No training in incident response

• No team ownership of product

• No team autonomy for changes

Page 48: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

On call - Goal

• Team members want to help

make things better

• Empowered to fix problems

• Reduce the times they are woken

up

Page 49: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

The operability of operability

• Operational Features, not “NFRs”

• Sustainable collaboration

• Sensible, fair on-call rotas

• Over-compensate in time off

• Avoid burn-out

Page 50: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Recapitulation

Page 51: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Software Operability

Making software

systems work well

in Production

Page 52: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Run Book Collaboration

Shared focus on operability throughout the delivery cycle

Page 53: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Making Operability Operable

Use DevOps team patterns for sustainable operability

Page 54: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

What’s Next?

Page 55: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Further Reading

• Patterns for

Performance and

Operability

– Ford, Gileadi, Purba,

Moerman

• http://whoownsmyoperability.com/

– Recommended reading lists

Page 56: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Operability Book

• Software Operability – How to make software work well in Production– Due early 2014

• Sign up at OperabilityBook.com

• Discount code for DevOps Summit attendees

Page 57: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Experience DevOps

• A hands-on workshop for DevOps

culture

• Forthcoming dates:

– Bangalore: 19th December 2013

– London: February 2014 (tbc)

• http://experiencedevops.org/

Page 58: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

PIPELINE Conference

• Continuous Delivery

• Tuesday 8th April 2014

• London, UK

• http://pipelineconf.info/

• @PipelineConf

Page 59: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Questions &

Discussion

Matthew Skelton

@matthewpskelton

softwareoperability.com

operabilitybook.com

bit.ly/DevOpsTopologies

Page 60: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Acknowledgements

http://pianofortekeys.files.wordpress.com/ 2013/04/ariadnne_wideweb__470x3300.jpg

http://www.blinkenlights.nl/images/ blinkenlights-big.jpeg

http://www.danatronics.com/s db_apps.html

http://riverbankoftruth.com/ wp-content/uploads/2013/07/embarrassed-chimp22.jpg

http://www.thinkgeek.com/edm/ 20040709.html

http://indianaohindiana.com/wp-content/uploads/2013/10/Tome.jpg

http://www.guavaworks.com/company-blog/guava-doesnt-do-cookie-cutter.html

http://www.carpages.co.uk/ford/ford-sand-sculptures-05-09-11.asp

http://www.thisismoney.co.uk/money/experts/ article-2324270/Take-smaller-pension-pots-tax-free-leave-final-salary-untouched.html

http://paranoidnews.org/wp-content/uploads/2010/10/Alien-Hunt-Alarm-Clock.jpg

http://particulations.blogspot.co.uk/ 2010/08/headingley-hole.html

http://marvel.wikia.com/ Stephen_Strange_(Earth-616)

Page 61: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Further Slides

Page 62: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

The Phoenix Project

Page 63: Software operability and run book collaboration - DevOps Summit, Bangalore

#u

nid

ev

op

s

Continuous Delivery