chernobyl human reliability definition: “the probability that a person will correctly perform some...

48

Upload: ralph-cunningham

Post on 25-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period
Page 2: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

CHERNOBYL

Page 3: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Human Reliability

Definition:

“The probability that a person will correctly perform some system-required activity during a given time period (assuming time is a limiting factor) without performing any extraneous activity that can degrade the system”

(Hollnagel, 2002)

Page 4: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Human Reliability Assessment (HRA)

Assessment of the impact of human errors on systems safety and, if warranted, the specification of ways to reduce human error impact and/or frequency.

HRA is far from being a precise science, but is a useful means for identifying and prioritizing safety vulnerabilities for HE, thereby reducing the frequency of accidents.

Hybrid area: - Engineering & reliability- Psychology- Ergonomics

Page 5: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Probabilistic Safety Analysis (PSA)

Engineering approach

Quantitative statement of finding expected frequencies of accidents and then compared against predefined risk criteria.

HRA must be incorporated into PSA if risk is to be properly estimated…hence the need for the theoretical framework (psychology and ergonomics)

Page 6: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA History

Started early 1960s…expanded greatly since 1979. Why?

Followed exact same procedure as conventional reliability analysis human tasks substituted for equipment failures

Greater variability and interdependence for human performance (‘human factor’)

How can we get this ‘variability’ and ‘interdependence’ information?

Page 7: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRAAccident sequence analyzed represented as

an event tree (slide on next page).

Nodes represent specific tasks/functions with 2 outcomes (success/failure)

Engineering approaches (PRA/PSA) can calculate failure probabilities in terms of material & process information, but HRA must also account for the “human factor” to determine if the human AS A COMPONENT will fail.

Page 8: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Event tree structure

(Hollnagel, 2002)

Page 9: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRA…cont (2)Traditional approach:

Determine HEP (human error probability) using tables (from collected data), HR models, or expert judgement

Account for the influence of Performance Shaping Factors (PSFs)

Calculate probability of an erroneous action using:

Page 10: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRA…cont (3)Using this formula we must make the

following assumptions:

1.) Probability of failure can be determined for specific types of actions independently of context.

2.) Effects of context are additive (various performance conditions don’t influence each other)

Page 11: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRA…cont (4)As a result, several models have been used to

improve HRA:

1.) Behavioral (Human Factors ) modelsFocus on simple manifestations (error modes)Described in terms of omissions, commissions,

extraneous actionsDerive probability that specific errors will occur

Problems Causal models very simple Weak in accounting for context

Page 12: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRA…cont (5)

2.) Information processing models

Focus on internal mechanisms i.e. decision making, reasoning, etc.

Explain flow of causes and effects through models

Problems: Models often complex Limited predictive power (hypothetical basis) Little concern for quantification Context not considered explicitly Better suited for retrospective analysis than predictions

Page 13: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Understanding HRA…cont (6)3.) Cognitive models

Focus on relation between error modes and causes

Models and (relatively) simple and context specific

Premise: Cognition is the reason why performance is efficient (or

limited) Operator seen as acting in anticipation of future event

Well suited for predictions and retrospective analysis

Page 14: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Framework10 Steps with 3 MAIN GOALS

GOALS:1.) Human error identification (What can go

wrong?)2.) Human error quantification (How often will a

human error occur?)3.) Human error reduction (How can human

error be prevented from reoccurring or its impact on the system be reduced?)

Page 15: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Generic Methodology

Systematic way of approaching HRA logically

Will help ensure the problem is dealt with reliably while minimizing error (biases)

Encompasses 10 steps from identifying the problem to final documentation

Page 16: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Steps in the HRA process:

Page 17: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Steps1.) Defining the problem

Define precisely the problem and its setting in terms of the system goals and overall forms of human of human-caused deviations from these goals.

2.) Task analysis

Define explicitly the data, equipment, behaviour, plans and interfaces used by the operators to achieve system objectives, and to identify factors affecting human performance within tasks.

3.) Human error analysis

Identify all significant human errors affecting performance of the system and finding ways in which human errors can be recovered.

Page 18: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Steps cont…4.) Representation

Model human errors and recovery paths in a logical manner for quantitative measurement (integrate human errors with hardware failures)

5.) Screening

Define the level of detail and effort with which the quantification will be conducted by defining all significant human errors and interactions and ruling out insignificant errors

6.) Quantification

Quantify human error probabilities and human error recovery probabilities (to determine likelihood of success in achieving system goals)

Page 19: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Steps cont…7.) Impact assessment

Determine sig. of human reliability to achieve system goals, to decide if improvements in human reliability are required, and (if so) what are the primary errors/factors negatively affecting the system.

8.) Error reduction

Identify error reduction mechanisms, the likelihood or error recovery, improving human performance in achieving system goals.

9.) Quality assurance

Ensure the enhanced system satisfactorily meets system performance criteria NOW and in the future.

Page 20: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

HRA Steps cont…

10.) Documentation

Detail all information necessary to allow the assessment to be understandable, auditable, and reproducible.

Page 21: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

1. Problem Definition2 parts of defining a problem:

1)Identify the HR problem2)Identify the HR context

- Once HR problem identified and defined within the systems context discussions with designers, engineers & operational managers should occur

- Define system goals at various levels operator action is required – this defines higher goals which the operator was aiming for and can get to the operator intentions at the time of the event and the root of the problem

Page 22: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Problem Definition cont. (2)Must investigate the “safety culture” of a plant

– this can dramatically influence HR and is important to defining the problem

If HRA is being carried out as part of an overall risk assessment the HR analyst will probably be given a set of scenarios to assign risk and HE to.

By the end of the process the problem should be explicitly defined in its respective system context: - Scenarios to be addressed- Overall tasks requires to achieve safety goals

within each scenario

Page 23: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

2. Task AnalysisPurpose:

Provide a complete, comprehensive description of the tasks that have to be performed by the operator to achieve system goals

- Many forms of TA, some notable ones include:1. Sequential – chronological order of events2. Hierarchal – considers tasks in a hierarchy

(importance) 3. Tabular – dynamic situations (operators

actions during a power plant emergency)

Page 24: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Task Analysis cont. (2)

Methods of deriving info from task analysis:

Interaction with all levels (operators, maintenance, supervisors, managers, system designers, etc.)

Observation (structured & unstructured interviews)

Procedure analysisIncident analysis Walkthrough/explanation of procedures from

operator(s)Examination of system documentation

Page 25: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Task Analysis cont. (3)

Important not to completely rely on procedures/operating instructions – practical, real life procedures often differ

Operator/employee knowledge (tacit knowledge) gained through experience vital in the TA process. Why?

Page 26: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

3. Human Error Analysis (HEA)

Stage to identify all errors associated with the task!!

Most critical part of HRA. WHY?...

- If significant errors are omitted, will not appear in analysis and may seriously UNDERESTIMATE EFFECTS of human error on the system

Page 27: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

3. HEA cont…(2)Method example #1Simplest approach to HEA…consider ‘external error

modes’

1.) Error of omission: Act omitted (not carried out)

2.) Error of commission: Act carried out inadequately Act carried out in wrong sequence Act carried out too early/late Error of quality (too little/too much)

3.) Extraneous error: Wrong (unrequired) act performed

Page 28: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

3. HEA cont…

Once the factors which influence HRA are identified the next step is representing them in such a way to indicate their effects on the system goals…

Page 29: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

4. Representation Visual representation of events/actions in a

scenario Can be used to represent simple or complex

failure paths Skill & proper knowledge is needed of

“tree” construction – trees can become extremely complex and off focus is not carefully put together

A smaller scenario with a low number of errors may not need or benefit from this type of representation

Page 30: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

4. Representation cont…(2)Fault TreeTypical representation of HE & its effect on a system is to

use a fault treeLogical structure that defines what events must occur in

order of an undesirable outcomeUndesirable event located at the top of the “tree” (most

important)

2 different types of gate that allow events to proceed to the next level

1)“OR” gate – Only used if any of the events joined below it by this gate occur

2)“AND” gate – Only used if all events joined below this gate occur

Page 31: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Fault Tree

Page 32: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

5. ScreeningIdentifies where major effort in the

quantification analysis should be applied. Filters out tasks/scenarios which may have

little contribution to system failureScreening methods have the ability to

potentially eliminate studying important errors and interactions in the analysis

As a general rule, when applying any screening technique – if in doubt, leave the human error in the fault/event tree

Page 33: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

6. QuantificationHuman reliability needs to be quantified to something that can

compared across the HRA spectrum

The metric for HRA quantification is Human Error Probability (HEP)

HEP = (# of errors occurred)/(# opportunities for error to occur)

Expressed as a number between 0 and 1.

Little recorded industrial HEP data available because:

Difficulty in estimating opportunities for error in realistic complex tasks (denominator error)

Confidentiality and unwillingness to publish poor performance data Lack of awareness regarding the usefulness of human error data

(hence no fiscal incentives)

Page 34: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

There are lots of ways to determine HEPFor this course we keep it simple!

Page 35: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

6. Quantification…cont.(2)

Problems with simulator data in determining HEPs:Personnel using simulators usually highly motivated

and know what’s on training curriculumReliability of emergency training/responses on

simulator compared to the real situation (‘cognitive fidelity issue’).

Experiments are also usually controlled investigating only one or two variables generalization risky!!!

Lack of ‘generalizability’ has led to:Non-data-dependent approachesi.e. Expert opinion/judgment

Page 36: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

7) Impact Assessment System risk or reliability is calculated Compared to acceptable levels/standards

establishedEach event is analyzed and classified into a fault

treeBoth HE & hardware/software analysis are taken

into account for the best combination to improve the system

If HE dominates error reduction methods must be investigated

If HE cannot be reduced to acceptable levels – redesign of the system is necessary

Page 37: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

8. Error Reduction

Not required if:Human reliability is adequate (what’s adequate?)It’s not the most effective means of achieving

system performance (other modifications more suitable)

Not within scope of assessment

If required, then:Focus on reducing impact/frequency of human

errorsImplement a more general error reduction

strategy (how?)

Page 38: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Steps in the HRA process:

Page 39: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

8. Error Reduction…cont (2)

Ways of reducing impact of critical errors on system by:

Prevent hardware/software changesIncrease system toleranceEnhance error recoveryReduce error at source

Page 40: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

8. Error Reduction…cont (3)Additional considerations:

Positive error reducing strategies should be factored back into quantitative analysis

Check that HEP(s) and overall system calculated risk become acceptable

As part of the quality assurance phase:Provide ‘operational definition’ for each error

reduction strategy

Ensure strategy is properly implemented and maintained over time

Page 41: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

9. Quality AssuranceEffectiveness of error reduction mechanism

implementation should be ensured by: Monitoring Performance verification (at later stage) Reliability/Validity Analysis (can be hard…why?)

Continuous performance monitoring systems present powerful quality assurance systems. Why?

Gradual performance standard degradation Increased maintenance loading Loss of personnel Impromptu changes (since startup) Increasing ‘retrofit’ changes

Page 42: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

9. Quality Assurance cont…(2)

Long-term performance monitoring allows for:

Identifying WHEN in time results of HRA my be outdated.

Signify need for further (or new) evaluation

Justifying acceptability of risk associated with system

Avoid gradual deterioration of safety barriers.i.e. BHOPAL SYNDROME (India)

Page 43: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

10. Documentation

Formally document all results of studyEnsure auditability and justifiability of resultsCan provide database for future investigation

and monitoring.

Ensure assumptions and judgments included!!Aid new/unconnected personnel to understandEnable independent examination, updating,

and reproductionAllows for learning from mistakes

Page 44: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Future Directions 1)Low technology Risk

2)Cognitive Errors and misdiagnosis

3) Management, organizational and sociotechincal contributions to risk

Page 45: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Low Technology RiskHRA traditionally used for high risk, high

technology industries HRA likes to focus on massive accidents that

happen less frequently – large consequences Not applied to high risk, low technology

sectors as much (ex. mining) – which has a larger number of “small” accidents

Can have very valuable applications to low technology industries

Page 46: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Cognitive Errors & Misdiagnosis

Operators may misdiagnose a situation, not realize the mistake and continue interpretation of the feedback incorrectly

Can make matters worse if operator overrides safety system then if nothing was done at all

Page 47: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Management, Organizational & Sociotechnical Contributions to Risk

HRA should not only be applied to operators/workers on the job site but also management and the organizational design of the plant

Bhopal, Challenger Shuttle & Chernobyl all had significant human error BUT current current HRA techniques would not have detected risk prior to accidents because error was neither procedural or diagnostic

Page 48: CHERNOBYL Human Reliability Definition: “The probability that a person will correctly perform some system-required activity during a given time period

Management, Organizational & Sociotechnical Contributions to Risk

Economics, time restraints, social pressures, communication breakdown – personality conflicts, etc. all add pressures on a system and its safety