amplifying sources of resilience...in the world (news, service provider outages, etc.) time-series...
TRANSCRIPT
![Page 1: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/1.jpg)
Amplifying Sources of ResilienceWhat the Research Says
John Allspaw (@allspaw) Adaptive Capacity Labs (@adaptiveclabs)
![Page 2: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/2.jpg)
me
2009 Velocity Conf
Consortium for Resilient Internet-Facing Business IT
Adaptive Capacity Labs
![Page 3: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/3.jpg)
Resilience Engineering Is a Field of Study
• Emerged from Cognitive Systems Engineering • Early 2000s, largely in response to NASA events in 1999 and 2000 • 7 symposia over 12 years
![Page 4: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/4.jpg)
Resilience Engineering is a Community
is largely made up of practitioners and researchers from….
Cybernetics
Engineering*
Ecology
Safety Science
Biology
Control Systems
Human Factors & Ergonomics
Cognitive Systems Engineering
Complexity Science
Cognitive Psychology
Sociology
Operations Research
![Page 5: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/5.jpg)
working in domains such as…
Rail Maritime
Surgery
Intelligence AgenciesLaw Enforcement
Aviation/ATM
Space
Mining
Construction
Explosives
FirefightingAnesthesia
Pediatrics
Power Grid & Distribution
Military Agencies
Software Engineering
Resilience Engineering is a Community
![Page 6: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/6.jpg)
Some of the cast of characters
David Woods CSEL/OSU
Shawna Perry Univ of Florida
Emergency Medicine
Dr. Richard Cook Anesthesiologist
Researcher
Ivonne Andrade Herrera SINTEF
Erik Hollnagel Univ of S. Denmark
Anne-Sophie Nyssen University de Liege Johan Bergström
Lund UniversitySidney Dekker
Griffith UniversityAsher Balkin CSEL/OSU
Laura Maguire CSEL/OSU
![Page 7: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/7.jpg)
Some of the cast of characters
David Woods CSEL/OSU
Shawna Perry Univ of Florida
Emergency Medicine
Dr. Richard Cook Anesthesiologist
Researcher
Ivonne Andrade Herrera SINTEF
Erik Hollnagel Univ of S. Denmark
Anne-Sophie Nyssen University de Liege Johan Bergström
Lund University
Sidney Dekker Griffith University
Asher Balkin CSEL/OSU
Laura Maguire CSEL/OSU
J. Paul ReedJessica DeVitaCasey RosenthalNora Jones (me)
![Page 8: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/8.jpg)
![Page 9: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/9.jpg)
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 10: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/10.jpg)
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 11: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/11.jpg)
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 12: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/12.jpg)
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 13: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/13.jpg)
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 14: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/14.jpg)
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
The Work Is Done Here
Your Product Or Service
The Stuff You Build and Maintain With
![Page 15: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/15.jpg)
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
![Page 16: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/16.jpg)
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
observinginferring
anticipatingplanning
troubleshootingdiagnosingcorrectingmodifyingreacting
![Page 17: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/17.jpg)
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
observinginferring
anticipatingplanning
troubleshootingdiagnosingcorrectingmodifyingreacting
![Page 18: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/18.jpg)
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
Copyright © 2016 by R.I. Cook for ACL, all rights reservedack: Michael Angeles http://konigi.com/tools/
What matters. Why what matters matters.
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
above the line
below the line
Why is it doing that?What needs to change?
What does it mean?
How should this work?What’s it doing?
What does it mean?
What is happening?What should be happening
What does it mean?
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
goalspurposes
risks
cognition
actionsinteractions
speechgestures
clickssignals
representations
artifacts
the line of representation
individuals have unique models of the “system”
observinginferring
anticipatingplanning
troubleshootingdiagnosingcorrectingmodifyingreacting
![Page 19: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/19.jpg)
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
Adding stuff to the running
system
Getting stuff ready to be part of the running
system
architectural & structural
framing
keeping track of what “the system” is
doing
code repositories
macrodescriptions
testing/validation suites
code
code stuff
meta rules
scripts, rules, etc.
test cases
code generating
tools
testingtools
deploytools
organization/ encapsulation
tools
“monitoring”tools
externally sourcedcode (e.g. DB)results
the using world
deliverytechnologystack
internally sourced coderesults
Time
…and things are changing here
things are changing
here…
![Page 20: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/20.jpg)
Amplifying Sources of ResilienceWhat the Research Says
John Allspaw Adaptive Capacity Labs
clarifications about what this actually is
can’t amplify something if you can’t find it first
![Page 21: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/21.jpg)
what we find when we look closely at incidents in software
![Page 22: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/22.jpg)
logs
time of year day of year time of day
observations and hypotheses others
share
what has been investigated thus far
what’s been happening in the world
(news, service provider outages, etc.)
time-series data
alerts
tracing/observability tools
recent changes in
existing tech
new dependencies
who is on vacation, at a conference,
traveling, etc.
status of other ongoing work
![Page 23: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/23.jpg)
- tenure - domain expertise - past experience with details
![Page 24: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/24.jpg)
multiple perspectives on
• what “it” is that is happening
• what can — and what cannot — be done to “stem the bleeding” or “reduce the blast radius”
• who has authority to take certain actions
• what shouldn’t be tried to mitigate or repair
![Page 25: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/25.jpg)
- problem detection - generating hypotheses - diagnostic actions - therapeutic actions - sacrifice decisions - coordinating - (re) planning - preparing for potential escalation/cascades
multiple threads of activity
some productive some unproductive
![Page 26: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/26.jpg)
time pressure high consequences
![Page 27: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/27.jpg)
this is not
“debugging” “troubleshooting”
![Page 28: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/28.jpg)
therefore…“resilience”?
![Page 29: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/29.jpg)
software development may incur future liability in order to achieve short-term goals
1992
Ward Cunningham THANK YOU, WARD!
![Page 30: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/30.jpg)
resilience is not:• preventative design
• fault-tolerance
• redundancy
• Chaos Engineering
• stuff about software or hardware
• a property that a system has
![Page 31: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/31.jpg)
unforeseen
unanticipated
unexpected
fundamentally surprising
![Page 32: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/32.jpg)
–Scott Sagan “The Limits of Safety”
“things that have never happened before happen all the time”
![Page 33: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/33.jpg)
resilience is:
• proactive activities aimed at preparing to be unprepared — without an ability to justify it economically!
• sustaining the potential for future adaptive action when conditions change
• something that a system does, not what it has
![Page 34: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/34.jpg)
sustained adaptive capacity
![Page 35: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/35.jpg)
sustained adaptive capacity
![Page 36: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/36.jpg)
Poised To Adapt1. Knowing what the platform is supposed to do
2. Knowing how the platform works
3. What the platform’s behavior means
4. Being able to devise a change that addresses 1, 2, & 3
5. Being able to predict the effects of that change
6. Being able to force the platform to change in that way
7. Being prepared to deal with the consequences
Dr. Richard Cook, Velocity Conf 2016 Santa Clara, CA
![Page 37: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/37.jpg)
Finding sources of resilience means finding and understanding cognitive work.
observinginferring
anticipatingplanning
troubleshootingdiagnosingcorrectingmodifyingreacting
![Page 38: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/38.jpg)
all incidents can be worse
what are things (people, maneuvers, knowledge, etc.) that went into preventing it from being worse?
![Page 39: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/39.jpg)
How can I find this “adaptive capacity”?
Find incidents that have:
• high degree of surprise
• whose consequences were not severe
• and look closely at the details about what went into making it not nearly as bad as it could have been
• protect and acknowledge explicitly the sources you find
![Page 40: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/40.jpg)
indications of surprise and novelty
<murphy> wtf happened here
<steve> I have no idea what is going on
<laurie> well that's terrifying
![Page 41: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/41.jpg)
indications about contrasting mental models
<laurie> so you want to rebuild {server01} first?<laurie> neither box has been touched yet<laurie> and im a tad nervous to do both at once
<lisa> wait wait, i thought the X table was small
<jeremy> I'm still a bit confused why B and A are different if A got to 0 and B is still at 3099
<tim>: oh I see.. the retry interval is pretty aggressive
![Page 42: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/42.jpg)
why not look at incidents with severe consequences?
• scrutiny from stakeholders with face-saving agenda tend to block deep inquiry
• with “medium-severe” incidents the cost of getting details/descriptions of people’s perspectives is low relative to the potential gain
• “Goldilocks” incidents are the ideal
![Page 43: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/43.jpg)
some (contextual) sources
• esoteric knowledge and expertise in the organization
• flexible and dynamic staffing for novel situations
• authority that is expected to migrate across roles
• a “constant sense of unease” that drives explorations of “normal” work
• capture and dissemination of near-misses
![Page 44: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/44.jpg)
Summary!
• Resilience is something a system does, not what a system has.
• Creating and sustaining adaptive capacity in an organization while being unable to justify doing it specifically = resilient action.
• How people (the flexible elements of The System™) cope with surprise is the path to finding sources of resilience
![Page 45: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/45.jpg)
Resilience is the story of the outage that didn’t happen.
![Page 46: Amplifying Sources of Resilience...in the world (news, service provider outages, etc.) time-series data alerts tracing/ observability tools recent changes in existing tech new dependencies](https://reader036.vdocument.in/reader036/viewer/2022081614/5fc9f9412d28427fba5c43de/html5/thumbnails/46.jpg)
Thank You!
@allspaw Resilience Is A Verb (Woods, 2018) http://bit.ly/ResilienceIsAVerb
Stella Report http://stella.report
https://www.adaptivecapacitylabs.com/blog
@AdaptiveCLabs
SRE Cognitive Work (chapter in Seeking SRE, O’Reilly Media) http://bit.ly/SRECognitiveWork
How Complex Systems Fail (Cook, 1998) http://bit.ly/ComplexSystemsFailure