learning from incidents at autotrader

Post on 14-Apr-2017

307 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learning from Incidents at Auto Trader

@4ndyHumphrey

Learning from Failure at Auto Trader

@4ndyHumphrey

What is a Learning Organisation? What is the Reality? What are my Choices? Incident Reviews - things to Avoid Incident Reviews - things to Encourage What about holding people to Account? A bit on Our process

Learning from Incidents

Our People

PRIVATE Car Sellers

Trade Car Dealers

30,000

15,000

Auto Trader Staff

Product & Tech Teams

850

275

Our Customers

Our Technology Platform

1.2 billion page views per month

70 million peak page views per day

15 million unique visitors per month

Supported by 100 live applications

Further Reading up front

Links:John Allspaw - The Infinite HowsSteve Shorrock - if it werent for the peopleEuroControl - Systems Thinking for SafetyLyndsay Holmwood - Blame-Language-SharingSydney Dekker - Just Culture

Black Box Thinking – Matthew Syed

People:Steven Shorrock

Erik Hollnagel

Sidney Dekker

Matthew Syed

John Allspaw

Lindsay Holmwood

Dave Zwieback

Nancy Leveson

Field Guide to Understanding Human Error – Sidney Dekker

Beyond Blame – Dave

Zwieback

Nancy Leveson - Engineering a Safer World

Further Reading up front

What is a Learning Organisation?

The Loom

A Learning Organisation

Moral ResponsibilityJob SatisfactionEconomic Imperative

Why should I want to learn?

What’s the reality?

Blame management

Blame - Fundamental Attribution Error

Blame - Justice

Blame - Hindsight

Blame – Bad Apple Theory

Blame – Ignoring context

 Jonathan Caramanus/Green Renaissance/wwf.org.uk

Blame - It’s Easy

What are my choices?

Things will always go wrong

https://www.youtube.com/watch?v=EvegBo4TUdQ

You can blame people…

Or say it’s a one off…

Or you can look at the context…

…Learn and make changes

“Blame is the enemy of safety…”

But it is a choice:

Nancy Leveson

W. Edwards Deming

“Whenever there is fear, you will get wrong figures.”

Incident Reviews:Things to avoid

Culture of fear

Top down

Asking Why?

Environment

Capabilities

Behavior

Values and Beliefs

Identity

Contexts – WHERE?

Methods, Approaches – HOW?

Skills and Actions – WHAT?

Motivation and permission - WHY?

Sense of Self, Role– WHO?

Questioning styles:

Dilts Model

Don’t go too Deep!

Environment

Capabilities

Behavior

Values and Beliefs

Identity

Contexts – WHERE?

Methods, Approaches – HOW?

Skills and Actions – WHAT?

What is important/true – WHY?

Sense of Self – WHO?

Dilts Model

Single Root Cause

Points scoring

Incident Reviews: How to encourage learning

Priming

Keep an open mind

Explore how events unfolded

Incident Review Prompts(from The Field Guide To Understanding Human Error, by Sidney Dekker)

At each juncture in the sequence of events (if that is how you want to structure this part of the accident story), you want to get to know:

• Which cues were observed (what did he or she notice/see or did not notice what he or she had expected to notice?)• What knowledge was used to deal with the situation? Did participants have any experience with similar situations that was useful in dealing with this one?• What expectations did participants have about how things were going to develop, and what options did they think they have to influence the course

of events?• How did other influences (operational or organizational) help determine how they interpreted the situation and how they would act?

Here are some questions Gary Klein and his researchers typically ask to find out how the situation looked to people on the inside at each of the critical junctures:

Debriefings need not follow such a scripted set of questions, of course, as the relevance of questions depends on the event. Also, the questions can come across toparticipants as too conceptual to make any sense. You may need to reformulate them in the language of the domain.

Cues What were you seeing?What were you focusing on?What were you expecting to happen?

Interpretation If you had to describe the situation to your colleague at that point, what would you have told?Errors What mistakes (for example in interpretation) were likely at this point?Previousexperience/knowledge

Were you reminded of any previous experience?Did this situation fit a standard scenario? Were you trained to deal with this situation? Were there any rules that applied clearly here?Did any other sources of knowledge suggest what to do?

Goals What were you trying to achieve?Were there multiple goals at the same time?Was there time pressure or other limitations on what you could do?

Taking action How did you judge you could influence the course of events?Did you discuss or mentally imagine a number of options or did you know straight away what to do?

Outcome Did the outcome fit your expectation?Did you have to update your assessment of the situation?

Communications What communication medium(s) did you prefer to use? (phone, chat, email, video conf, etc.?) Did you make use of more than one communication channels at once?

Help Did you ask anyone for help?What signal brought you to ask for support or assistance? Were you able to contact the people you needed to contact?

Timelines

14:00 Alert received from

Site confidence

15:15 Incident communication

sent

16:00 Incident closure comms

sent

1. Factual timeline entries can be filled in prior to the Review Meeting

Timelines

14:00 Alert received from

Site confidence

15:15 Incident communication

sent

16:00 Incident closure comms

sent

1. Factual timeline entries can be filled in prior to the Review Meeting

13:10 Slow server performance

observed by BIll

14:20 Bill spoke to John about SC issues and

decided to recover DB

15:50 John finished DB recovery

2. As a group, overlay the basic timeline with key decisions and junctures

One conversation

Actions

Impartial facilitator

Investigate what went well

Practice – make it habit

What about holding people to account?

Accountability

Our process:

Major IncidentsHigh Severity IncidentsFailed Releases (all)Failed Changes (Large)

Our Process

Priming – Timeline - Actions

We understand and truly believe that everyone did the best job they could, given what they knew at the

time, their skills and abilities, the resources available, and the situation at hand

We are here to learn and find solutions to improve our ways of working

Why we are here:

Open MindedGo back in time

No single ‘Root Cause’How not Why

Things that help us learn

Blaming peopleHuman ErrorArse CoveringPoints scoring‘Trying Harder’

Talking over people

Things that stop us learning:

After the review:

• Incident details recorded• Actions (owners, dates) recorded• Owned by Service Management Team

Further Reading up front

Links:John Allspaw - The Infinite HowsSteve Shorrock - if it werent for the peopleEuroControl - Systems Thinking for SafetyLyndsay Holmwood - Blame-Language-SharingSydney Dekker - Just Culture

Black Box Thinking – Matthew Syed

People:Steven Shorrock

Erik Hollnagel

Sidney Dekker

Matthew Syed

John Allspaw

Lindsay Holmwood

Dave Zwieback

Nancy Leveson

Field Guide to Understanding Human Error – Sidney Dekker

Beyond Blame – Dave

Zwieback

Nancy Leveson - Engineering a Safer World

Further Reading Again

Questions?

top related