Safety and security in Mission Critical IoT Systems- Supporting human decision makers in dynamic environments
Einar Landre
Statoil
Motivation
Failed Safety Critical Decisions
- Situational awareness- Trustworthiness- Culture- Decision quality
Human brain - planets most sophisticatedand vulnerable decision maker
the weakest point
• Emotions trumps facts (irrationality)
• Limited processing capacity
• Need to rest, easily bored
• Inconsistency across exemplars
• Creative, easily distracted
• Values (ethics and morale)
• Mental illness
How to compensate?
Things
Troll A, 472 meters, the largest man made “thing” ever moved
Software was an alien concept
things anno 1995
things anno 2015
Asgard subsea compression runs on software
Size = a football field
things anno 2025
Internet of critical things
critical thingsThings or networks of things where failure could lead to an accident
- Pressure vessels- Oil & Gas wells- Boilers
- Industrial Instrumentation & Control - Emergency shutdown - Fire and gas leak detection
- Life support devices- Pacemakers- Infusion pumps
form critical systems
system criticalityNon - Critical
Useful system- Low dependability
- System does not need to be trusted
Business - Critical Mission - Critical Safety - Critical
High Availability- Focus on cost s of
failure caused by system downtime, cost of spares, repair equipment and personnel and warranty claims
High Reliability- Increase the
probability of failure free system operation over a specified time in a given environment for a given purpose
High Safety & Integrity Level- High reliability
- High availability
- High security
- Focus is not on cost, but on preserving life and nature
Case Study
Drill string
Drilling Control System
Weight on Bit
Rotation Mud Circulation
Manual Control- Interpret data- Perform tasks
A manually controlled process
drilling
• I have to make frequent decisions and many of them depend upon readings from sensors that can be correct, noisy, random, unavailable, or in some other state.
• The decisions I have to make often have safety consequences, they certainly have economic consequences, and some are irreversible.
• At any point in time there may be three or four actions I could take based on my sense of what’s happening on the rig
• I would like better support to determine how trustworthy my readings are, what the possible situations are and the consequences of each action.
What is the best actionto take?
enhance human decision making
systems of action
• Can sense or observe a phenomena, process or machine
• Process observations and search for anomalies, undesired statechanges and other deviations that must be dealt with.
• Plan and execute / (recommend execution of) actions to bring the observedphenomena, process or machine back to its desired operational state.
• Monitor effects of actions and re-plan if action did not have intended effecton process state
Computer systems that
making better decisions under stress and uncertainty
“Drillers Buddy”
Real-time data
Manual Control
Recommend actions incontext of process state
add active computer support
Drill string
Drilling Control System
Weight on Bit
Rotation Mud Circulation
Drillers Buddy
State & Events
Drilling Simulator• Hydraulic model• Mechanical model• Temperature model
Drilling Advisor• Uncertainty model• Causality model• Reasoning• Planning model
Drilling Control System
Real-Time DataActions
technical building blocks
Action to be executed by human, but concept opens up for more computer control in the future.
i.e. Drilling advisor can be turned into “synthetic driller”.
Historical Data
What is the best action to take for the business?
What is the best action to take for control or safety?
What is the process state and where is it heading?
What do we know for certain and what are we estimating?
What are we measuring directly, with what accuracy?
What can we infer about performance and changes in the physical system?
Local Action Optimization
Situational Awareness
Uncertainty and Validation
Physical System Behavior
Physical System Sensing
Global Action Optimization
Incr
easin
gly
Acti
onab
le In
form
ation
expressed in capabilities
Local Action Optimization
Situational Awareness
Uncertainty and Validation
Physical System Behavior
Physical System Sensing
Global Action Optimization
Machine learning
(Bayesian)+
Physics(Cyb)
Decision /
game theory
Automated planning
and schedulin
g
Rational agent
• has goals• models uncertainty• chooses action with
optimal expected outcome for itself
• Examples: − human (on a good
day)− intelligent software
agent
more sophisticated technology
Sensors
solution creates new challenges
What parts are safety critical?
What parts are only business critical?
How to assess and protect against cyber threats?
How does failure in non-safety part influence safety and security?
What dependencies do we have?
Industry become software dependent
How to design software that tackles mechanical failures?
21 2014-04-24
how to build trustworthy software?
Software
before softwareTangible control logic
• Design level
• Implementation level
• Verification & test level
No cyber threats
• Intrusion
• Viruses
• Theft
• Identity
two unique propertiesInspection & Test • Software can’t be inspected and
tested as analogous components
CPU – the single point of failure• All signals are threaded through the
one single element.
• Execution sequence is un-known
• Same defect is systemized acrossmultiple instances
Impacts how we must manage software for critical systems
some specific challengesCommon mode failure
Malware, Viruses and Hacking
Human Factors
Blurred boundaries
common mode failure“results from an event which because of dependencies causes a coincidence of failure states of components in two or more separate channels of a redundancy system, leading to the defined systems failing to perform its intended function”.
Ariane 5 test launch, 1996
malware, viruses and hacking
Motivated by financial, political, criminal or idealistic interests
Software created to cause harm• Change of system behaviour• Steal / destroy data or machines
Exploits weaknesses in• Human character• Technical designs
Horror stories:• Stuxnet and the Iranian centrifuges (Siemens control system)• Saudi Aramco hack of 35000 computers (Windows back office)
human factors
How to minimize the effects of human error?
Mistakes occur everywhere• Specification• Design• Implementation• Deployment• Operations
Humans make mistakes• By commission • By omission• By carelessness
blurred boundaries
Conflicting interests, divergent situational understanding acrossdisciplines and roles.
Architects thinks and designs in terms of hierarchy and layering
Programmers thinks and designs in terms of threads of execution
Users need systems that works and solves a real world problems
Operations needs to get the job done
Tools
systems engineeringArchitecture centric• Design• Implementation• Deployment• Usage
Risk based• Requirements• Design• Implementation• Commissioning• Usage
Forging “design thinking” with “high-integrity systems” practices
architecture
Separation and protection of critical functions
Local Action Optimization
Situational Awareness
Uncertainty and Validation
Physical System Behavior
Physical System Sensing
Global Action Optimization
standards
IEC 61508 Functional safety of safety instrumented systems for the process industry sector
IEC 61511 Safety instrumented systems for the process industry sector
DO-178C Software considerations in airborne systems and equipment certification
The good thing about standards is that there are so many to choose fromAndrew S. Tanenbaum
Not sufficient on their own
Represents insights
Must be tailored to be useful
evidence based safety & security
Thanks to professor Tim Kelly @ University of York
Summary
summaryThings run on software
Critical things form critical / high-integrity systems
Cognitive functions make software inherent complicated
Holistic, architecture centric Systems Engineering
Software is used to offload and support human operators2nd and 3d order failure effects must be addressed upfront
Forging design thinking with high-integrity systems practices
Safety and security in mission critical IoT systems
Einar LandreLead AnalystE-mail [email protected]: +4741470537
www.statoil.com
Thank you