chernobyl human reliability definition: “the probability that a person will correctly perform some...
TRANSCRIPT
CHERNOBYL
Human Reliability
Definition:
“The probability that a person will correctly perform some system-required activity during a given time period (assuming time is a limiting factor) without performing any extraneous activity that can degrade the system”
(Hollnagel, 2002)
Human Reliability Assessment (HRA)
Assessment of the impact of human errors on systems safety and, if warranted, the specification of ways to reduce human error impact and/or frequency.
HRA is far from being a precise science, but is a useful means for identifying and prioritizing safety vulnerabilities for HE, thereby reducing the frequency of accidents.
Hybrid area: - Engineering & reliability- Psychology- Ergonomics
Probabilistic Safety Analysis (PSA)
Engineering approach
Quantitative statement of finding expected frequencies of accidents and then compared against predefined risk criteria.
HRA must be incorporated into PSA if risk is to be properly estimated…hence the need for the theoretical framework (psychology and ergonomics)
HRA History
Started early 1960s…expanded greatly since 1979. Why?
Followed exact same procedure as conventional reliability analysis human tasks substituted for equipment failures
Greater variability and interdependence for human performance (‘human factor’)
How can we get this ‘variability’ and ‘interdependence’ information?
Understanding HRAAccident sequence analyzed represented as
an event tree (slide on next page).
Nodes represent specific tasks/functions with 2 outcomes (success/failure)
Engineering approaches (PRA/PSA) can calculate failure probabilities in terms of material & process information, but HRA must also account for the “human factor” to determine if the human AS A COMPONENT will fail.
Event tree structure
(Hollnagel, 2002)
Understanding HRA…cont (2)Traditional approach:
Determine HEP (human error probability) using tables (from collected data), HR models, or expert judgement
Account for the influence of Performance Shaping Factors (PSFs)
Calculate probability of an erroneous action using:
Understanding HRA…cont (3)Using this formula we must make the
following assumptions:
1.) Probability of failure can be determined for specific types of actions independently of context.
2.) Effects of context are additive (various performance conditions don’t influence each other)
Understanding HRA…cont (4)As a result, several models have been used to
improve HRA:
1.) Behavioral (Human Factors ) modelsFocus on simple manifestations (error modes)Described in terms of omissions, commissions,
extraneous actionsDerive probability that specific errors will occur
Problems Causal models very simple Weak in accounting for context
Understanding HRA…cont (5)
2.) Information processing models
Focus on internal mechanisms i.e. decision making, reasoning, etc.
Explain flow of causes and effects through models
Problems: Models often complex Limited predictive power (hypothetical basis) Little concern for quantification Context not considered explicitly Better suited for retrospective analysis than predictions
Understanding HRA…cont (6)3.) Cognitive models
Focus on relation between error modes and causes
Models and (relatively) simple and context specific
Premise: Cognition is the reason why performance is efficient (or
limited) Operator seen as acting in anticipation of future event
Well suited for predictions and retrospective analysis
HRA Framework10 Steps with 3 MAIN GOALS
GOALS:1.) Human error identification (What can go
wrong?)2.) Human error quantification (How often will a
human error occur?)3.) Human error reduction (How can human
error be prevented from reoccurring or its impact on the system be reduced?)
HRA Generic Methodology
Systematic way of approaching HRA logically
Will help ensure the problem is dealt with reliably while minimizing error (biases)
Encompasses 10 steps from identifying the problem to final documentation
Steps in the HRA process:
HRA Steps1.) Defining the problem
Define precisely the problem and its setting in terms of the system goals and overall forms of human of human-caused deviations from these goals.
2.) Task analysis
Define explicitly the data, equipment, behaviour, plans and interfaces used by the operators to achieve system objectives, and to identify factors affecting human performance within tasks.
3.) Human error analysis
Identify all significant human errors affecting performance of the system and finding ways in which human errors can be recovered.
HRA Steps cont…4.) Representation
Model human errors and recovery paths in a logical manner for quantitative measurement (integrate human errors with hardware failures)
5.) Screening
Define the level of detail and effort with which the quantification will be conducted by defining all significant human errors and interactions and ruling out insignificant errors
6.) Quantification
Quantify human error probabilities and human error recovery probabilities (to determine likelihood of success in achieving system goals)
HRA Steps cont…7.) Impact assessment
Determine sig. of human reliability to achieve system goals, to decide if improvements in human reliability are required, and (if so) what are the primary errors/factors negatively affecting the system.
8.) Error reduction
Identify error reduction mechanisms, the likelihood or error recovery, improving human performance in achieving system goals.
9.) Quality assurance
Ensure the enhanced system satisfactorily meets system performance criteria NOW and in the future.
HRA Steps cont…
10.) Documentation
Detail all information necessary to allow the assessment to be understandable, auditable, and reproducible.
1. Problem Definition2 parts of defining a problem:
1)Identify the HR problem2)Identify the HR context
- Once HR problem identified and defined within the systems context discussions with designers, engineers & operational managers should occur
- Define system goals at various levels operator action is required – this defines higher goals which the operator was aiming for and can get to the operator intentions at the time of the event and the root of the problem
Problem Definition cont. (2)Must investigate the “safety culture” of a plant
– this can dramatically influence HR and is important to defining the problem
If HRA is being carried out as part of an overall risk assessment the HR analyst will probably be given a set of scenarios to assign risk and HE to.
By the end of the process the problem should be explicitly defined in its respective system context: - Scenarios to be addressed- Overall tasks requires to achieve safety goals
within each scenario
2. Task AnalysisPurpose:
Provide a complete, comprehensive description of the tasks that have to be performed by the operator to achieve system goals
- Many forms of TA, some notable ones include:1. Sequential – chronological order of events2. Hierarchal – considers tasks in a hierarchy
(importance) 3. Tabular – dynamic situations (operators
actions during a power plant emergency)
Task Analysis cont. (2)
Methods of deriving info from task analysis:
Interaction with all levels (operators, maintenance, supervisors, managers, system designers, etc.)
Observation (structured & unstructured interviews)
Procedure analysisIncident analysis Walkthrough/explanation of procedures from
operator(s)Examination of system documentation
Task Analysis cont. (3)
Important not to completely rely on procedures/operating instructions – practical, real life procedures often differ
Operator/employee knowledge (tacit knowledge) gained through experience vital in the TA process. Why?
3. Human Error Analysis (HEA)
Stage to identify all errors associated with the task!!
Most critical part of HRA. WHY?...
- If significant errors are omitted, will not appear in analysis and may seriously UNDERESTIMATE EFFECTS of human error on the system
3. HEA cont…(2)Method example #1Simplest approach to HEA…consider ‘external error
modes’
1.) Error of omission: Act omitted (not carried out)
2.) Error of commission: Act carried out inadequately Act carried out in wrong sequence Act carried out too early/late Error of quality (too little/too much)
3.) Extraneous error: Wrong (unrequired) act performed
3. HEA cont…
Once the factors which influence HRA are identified the next step is representing them in such a way to indicate their effects on the system goals…
4. Representation Visual representation of events/actions in a
scenario Can be used to represent simple or complex
failure paths Skill & proper knowledge is needed of
“tree” construction – trees can become extremely complex and off focus is not carefully put together
A smaller scenario with a low number of errors may not need or benefit from this type of representation
4. Representation cont…(2)Fault TreeTypical representation of HE & its effect on a system is to
use a fault treeLogical structure that defines what events must occur in
order of an undesirable outcomeUndesirable event located at the top of the “tree” (most
important)
2 different types of gate that allow events to proceed to the next level
1)“OR” gate – Only used if any of the events joined below it by this gate occur
2)“AND” gate – Only used if all events joined below this gate occur
Fault Tree
5. ScreeningIdentifies where major effort in the
quantification analysis should be applied. Filters out tasks/scenarios which may have
little contribution to system failureScreening methods have the ability to
potentially eliminate studying important errors and interactions in the analysis
As a general rule, when applying any screening technique – if in doubt, leave the human error in the fault/event tree
6. QuantificationHuman reliability needs to be quantified to something that can
compared across the HRA spectrum
The metric for HRA quantification is Human Error Probability (HEP)
HEP = (# of errors occurred)/(# opportunities for error to occur)
Expressed as a number between 0 and 1.
Little recorded industrial HEP data available because:
Difficulty in estimating opportunities for error in realistic complex tasks (denominator error)
Confidentiality and unwillingness to publish poor performance data Lack of awareness regarding the usefulness of human error data
(hence no fiscal incentives)
There are lots of ways to determine HEPFor this course we keep it simple!
6. Quantification…cont.(2)
Problems with simulator data in determining HEPs:Personnel using simulators usually highly motivated
and know what’s on training curriculumReliability of emergency training/responses on
simulator compared to the real situation (‘cognitive fidelity issue’).
Experiments are also usually controlled investigating only one or two variables generalization risky!!!
Lack of ‘generalizability’ has led to:Non-data-dependent approachesi.e. Expert opinion/judgment
7) Impact Assessment System risk or reliability is calculated Compared to acceptable levels/standards
establishedEach event is analyzed and classified into a fault
treeBoth HE & hardware/software analysis are taken
into account for the best combination to improve the system
If HE dominates error reduction methods must be investigated
If HE cannot be reduced to acceptable levels – redesign of the system is necessary
8. Error Reduction
Not required if:Human reliability is adequate (what’s adequate?)It’s not the most effective means of achieving
system performance (other modifications more suitable)
Not within scope of assessment
If required, then:Focus on reducing impact/frequency of human
errorsImplement a more general error reduction
strategy (how?)
Steps in the HRA process:
8. Error Reduction…cont (2)
Ways of reducing impact of critical errors on system by:
Prevent hardware/software changesIncrease system toleranceEnhance error recoveryReduce error at source
8. Error Reduction…cont (3)Additional considerations:
Positive error reducing strategies should be factored back into quantitative analysis
Check that HEP(s) and overall system calculated risk become acceptable
As part of the quality assurance phase:Provide ‘operational definition’ for each error
reduction strategy
Ensure strategy is properly implemented and maintained over time
9. Quality AssuranceEffectiveness of error reduction mechanism
implementation should be ensured by: Monitoring Performance verification (at later stage) Reliability/Validity Analysis (can be hard…why?)
Continuous performance monitoring systems present powerful quality assurance systems. Why?
Gradual performance standard degradation Increased maintenance loading Loss of personnel Impromptu changes (since startup) Increasing ‘retrofit’ changes
9. Quality Assurance cont…(2)
Long-term performance monitoring allows for:
Identifying WHEN in time results of HRA my be outdated.
Signify need for further (or new) evaluation
Justifying acceptability of risk associated with system
Avoid gradual deterioration of safety barriers.i.e. BHOPAL SYNDROME (India)
10. Documentation
Formally document all results of studyEnsure auditability and justifiability of resultsCan provide database for future investigation
and monitoring.
Ensure assumptions and judgments included!!Aid new/unconnected personnel to understandEnable independent examination, updating,
and reproductionAllows for learning from mistakes
Future Directions 1)Low technology Risk
2)Cognitive Errors and misdiagnosis
3) Management, organizational and sociotechincal contributions to risk
Low Technology RiskHRA traditionally used for high risk, high
technology industries HRA likes to focus on massive accidents that
happen less frequently – large consequences Not applied to high risk, low technology
sectors as much (ex. mining) – which has a larger number of “small” accidents
Can have very valuable applications to low technology industries
Cognitive Errors & Misdiagnosis
Operators may misdiagnose a situation, not realize the mistake and continue interpretation of the feedback incorrectly
Can make matters worse if operator overrides safety system then if nothing was done at all
Management, Organizational & Sociotechnical Contributions to Risk
HRA should not only be applied to operators/workers on the job site but also management and the organizational design of the plant
Bhopal, Challenger Shuttle & Chernobyl all had significant human error BUT current current HRA techniques would not have detected risk prior to accidents because error was neither procedural or diagnostic
Management, Organizational & Sociotechnical Contributions to Risk
Economics, time restraints, social pressures, communication breakdown – personality conflicts, etc. all add pressures on a system and its safety