brainstorming failure
Post on 15-Apr-2017
816 Views
Preview:
TRANSCRIPT
Brainstorming Failure
Bio - Jeff Smith
• Manager, Site Reliability Engineering at Grubhub
• Yes, we are also hiring.
• Yes, there is free food. Yes, it's totally awesome to work here.
Email: jsmith@grubhub.comTwitter: @DarkAndNerdyBlog: http://www.allthingsdork.com
Systems Metaphor
First Class Luxuries
Top of the Line Internals
The Cockpit
WAT?
The Cockpit
The cockpit you're expecting
FEEDBACK!
What Do You Measure?
FMEAFailure Mode Effects Analysis is a step-by-step approach for identifying the possible ways a process, product or service might fail. The process is commonly leveraged in quality organizations across a wide range of industries.
FMEA in Software EngineeringWe can use FMEA in a number of ways in software to help us brainstorm, rank and prioritize different actionable bits about the system. The process will help us
• Identify key metrics that need tracking
• Identify monitoring and or alerts that need to be created
• Identify necessary feedback loops
The Case for Cross-Functional Teams
The Process1. Examine the process
2. Brainstorm potential failures
3. List potential effects of failure
4. Identify Your Scale
5. Assign Severity ranking
6. Assign Occurrence ranking
7. Assign Detection ranking
Examine the Process
Brainstorm Potential Failures• Brainstorming should be fluid. Everything goes
• Cross-Functional teams should be involved. (Business, development, operations, design)
List Potential Effects of FailureThink through the impact of failure. The impact might be something process related, reputation related or technical, just to name a few. Examples:
• Degraded customer experience
• Order not fulfilled
• Delay in payment to accounts receivable
Agree on Risk Level ScalesTechnology Industry
• Low severity could be degraded performance
• High severity could be complete site outage
Airline Industry
• Low severity could be departure delay
• High severity could be customer death
Assign Severity RankingRank the severity on a scale between 1-10.
• 1 being the severity is inconsequential
• 10 being a catastrophic failure
In some organizations, 9 and 10 are reserved for personal injury and death.
If a failure mode has more than one effect, select only the most severe of the effects
Assign Occurrence RankingRank the likelihood that this condition will occur.
• 1 being extremely unlikely
• 10 being inevitable.
Assign Detection RankingRank the likelihood that this condition would be detected if it occurred. A scenario is only considered "detected" if it is found before it would impact a customer or user.
• 1 means the control would absolutely be detected
• 10 means the control is certain to not detect the failure.
Calculate the Risk Priority NumberThe Risk Priority Number is a value that is calculated to rank a particular failure mode. The higher the RPN the sooner the failure mode should be addressed
RPN = S * O * D
Develop an Action PlanEvaluate the list and develop an action plan to eliminate or mitigate the items with the highest RPN value first.
• Prioritize solutions that are self-healing and exist within the system under consideration.
• Develop metrics that help to track the health surrounding a failure item
• The goal is to reduce the RPN by lowering Severity, Occurrence or Detection scores
Ensuring You Have a Feedback LoopThe feedback loop is a constant evaluation of these measurements and indicators. The feedback loop should give a strong indicator that the system is working as expected, while at the same time exposing trends in the environment.
Leading and Lagging IndicatorsLeading Indicator - A measurable factor that changes before the system enters a particular state of failure. (Metrics)
Lagging Indicator - A measurable factor that changes after the system enters a particular state of failure. (Logs/Reporting)
Recap• Examine your process, and assemble a cross-functional
team with different views of the system
• Brainstorm all your potential failure modes
• Calculate your RPN
• Develop action plans to reduce risk. Ensure the system is providing feedback loops to be able to identify the current state of the system
• Profit
Resources• Quality One FMEA Writeup
• Purdue University FMEA Presentation
• iSixSigma
• Google Docs FMEA Template
• Brainstorming Tools Mind Node, FreeMind, XMind
Thanks!
top related