fmea - failure modes and effects analysis + criticality ... · pdf file1 fmea - failure modes...
TRANSCRIPT
1
FMEA - Failure Modes and Effects Analysis +
Criticality (FMECA)
A Core Component of RCM
Presented By:
Tim Bair
Research Engineer
The Applied Research Laboratory at
The Pennsylvania State University
2
What is the Role of
Maintenance?
• Maintenance helps to ensure that assets continue to fulfill their intended functions:
– Failure: System completely loses functionality or the operational capability falls below the minimum standarddesired by the user.
• Minimum Standard: initial designed capability or desired performance below design capability
• In order to determine the need for maintenance, the function and performance standards for the asset must be defined.
Initial Capability
Margin for Deterioration
Acceptable
Performance
Perf
orm
ance
Failure
3
What is a FMEA?
Failure Modes and Effects Analysis (FMEA) It is a tool that is an integral part of the RCM Process.
• The FMEA process is a systematic method to identify:– Primary and secondary functions of the system and the failure
modes that prevent the system from completing its designed purpose.
• Process Objective: – Identify and prioritize the failure modes and the subsequent
effects to the system to help eliminate or minimize catastrophic and critical failure modes through the most appropriate type of maintenance methodology.
• Predictive Maintenance
• Preventative Maintenance
• No Scheduled Maintenance
5
The Major Elements of the Basic
RCM Process
• RCM Establishment and Planning
• Analysis:
– Define the function and functional failures of a specific platform, system or component.
– Then conduct a Failure Modes and Effects Analysis
– Identify the failure consequences
– Determine maintenance tasks and intervals.
• Analysis Audit
• Implementation
• Sustaining the RCM Program:
– RCM is a ‘Living Program’
– Implement a RCM management, training, benchmarking, and review process to provide feedback and measurement of progress toward asset management goals
6
Determine Scope of Analysis
• The ‘Scope of Analysis’ is where the systems and sub-systems to be
analyzed are identified and a description indicating to what level of detail
each will be analyzed.
• The system will be partitioned, and the level and extent of analysis
necessary to meet program objectives is identified.
– Define reasonable boundaries so that the system includes the
necessary inputs and outputs but is not so large that it is difficult to
analyze.
– Include all failure modes that are ‘reasonably likely’ to cause functional
failure.
7
Defining System and Boundaries: Aircraft
Hydraulic System
• A system is a user defined group of:
– Components
– Systems
– Equipment
that support an operational requirement.
• Boundaries are selected to divide a complex system into manageable sub-systems.
– A boundary or interface should define the inputs and outputs of the system.
8
Level and Extent of the Analysis: How
Deep Do You Drill Down?
Hydraulic System
Hydraulic Pumps
Control Surfaces
Aircraft
•Cracked Blade
•Broken Blade
•Corrosion
•Shaft Broken
•Coupling Broken
•No Coupling
Lubrication
•Splines Broken
•Bolts Broken
•Inner Race Failure
•Outer Race Failure
•Element Failure
•Cage Failure
•Seal Failure
•Lubrication Failure
•Viscosity
Breakdown
•Contamination
•Loss of
Lubrication
•Material Failure
•Excessive Wear
Root Cause
of Failure
Analysis
The goal is to isolate the
failure mode to the
lowest level that allows
for the most effective
application of the
maintenance policy.
ShaftImpellor Bearings Lubrication Seals Components
Maintain Level
CBM Level
9
Reliability Block Diagram:
Fault Tree AnalysisHydraulic Pump/Turbine Engine:
Total Loss of Fluid Flow
Failure Mode – λ1
Left Pump Failure – λ2 Left Engine Failure – λ3
Shaft
Failure
Bearing
SeizureAir
Failure
Fuel
Failure
01
02 03
λ4 λ5 λ6λ11
Fluid Leak
λ8
Right Pump Failure – λ9 Right Engine Failure – λ10
Shaft
Failure
Bearing
SeizureAir
Failure
Fuel
Failure
04
05 06
λ7λ12 λ14 λ15
Fluid Leak
λ13
10
Steps for the Analysis Process:
Information and Decision
1. Identify System Functions: What does the
user need the system to do in its current
operating context?
2. Identify Functional Failures: In what way
can the system fail (or fail to fulfill its
function)?
3. Identify the Failure Modes: What causes
the failures?
4. Identify the Failure Effects: What happens
when failures occur and what are the
symptoms of failure?
5. Identify Failure Consequences: How and
why does the failure matter.
• Frequency of occurrence
• Severity of the failure mode
Reference: TACOM ILSC CBM+, Reliability Centered Maintenance Process Overview, C1061-03-0004
11
Defining Function
2.1.1
• What are the functions and associated desired standards of performance of the asset in its present operating context?– Identify all primary and secondary functions of the asset or system
in terms of performance.
• Primary Functions: Speed, Output, Storage Capacity, Product Quality
• Secondary Functions: Safety, Control, Containment, Comfort, Economy, Environmental Compliance
– Functional statements should contain a verb, an object and a performance standard.
• The turbine engine is designed to provide 65000 lbs of thrust @ 21000 rpm.
• The platform is designed to transport 100 soldiers and 10,000 lbs. of cargo at an maximum speed of 650 mph.
– Performance standards should be at a level desired for the operational context.
12
Defining Function –Operating Context
• The operating context may influence the primary and secondary functions.– Operating Environment:
• The operating environment may be unique for one type of equipment, which may be different than other (like) equipment.
– Safety and Environmental Standards:• Equipment may have different safety and environmental standards
based on how they are operated with respect to humans.
– Regular or Intermittent Use:• Does the equipment or platform deploy regularly or is it only used
for unique circumstances (i.e. cold weather kits, bridge layer platforms)?
– System Redundancies:• With backup systems, one piece of equipment may operate
continuously and the redundant system may be on stand-by.
13
Defining FunctionPerformance Standard
• Many systems will have multiple performance standards.– The engine must operate at 3000 RPM and 1000 HP continuously.
• Quantitative vs. Qualitative: be as precise as possible when defining a performance standard.
– Absolute standard: exact specification• To contain 40 gallons of fluid or to contain fluid.
– Variable standard: mean with upper and lower limits• To contain an average of 40 gallons of fluid +/- 1 gallon.
• Example: Hydraulic PTO Pump – What is its function in its operating context?
– To be able to move hydraulic fluid on demand at a flow rate ranging from 550 to 650 cfm, at a pressure range from 600 to 2000 psig and at a temperature ranging from -50 F to +220 F.
14
Defining Function –
Hidden Function
• A function whose failure does not become apparent to the operating crew under normal circumstances.
– Equipment Protective Devices (Safety)• Provide a warning indicator of an abnormal condition
• Shut down for equipment in case of failure (avoidance of catastrophic failure)
• Provide redundant control in case of primary control failure
• Guards to prevent physical harm
• Example: Hydraulics – What are the hidden functions?– Relief valve must activate at 2000 psi (OEM setting for maximum
payload) as a safety device from overloading the system.
– Relief Valve Failure Mode: Due to normal valve wear (spring sag) the relief valve will begin to activate at 1600 psi, which will limit the maximum lift capability.
• This will result in a functional failure of the hydraulic system.
15
Defining Functional
Failure 2.1.2• In what ways can a system fail to fulfill its function to the standard of
performance required by the user?
– Performance standards for the asset must be well defined and agreed upon by operations and maintenance.
– Functional Failure: Once the function of the asset has been established, the inability of the asset to perform to the defined standard constitutes a failure.
– List all failed states associated with each function.• Total and Partial Failures
• Limit Exceedence (Operational Performance)
• Operational Metric Displays (Gauges and Indicators)
• Example: What are the Pump Functional Failures?
1. Total loss of flow 2. No indication of operational parameters
3. Flow below required rate 4. Unable to contain fluid
5. Fluid pressure out of range 6. Inability to control pump operation
7. Fluid temperature above require range
16
Defining Failure
Modes 2.1.3
• What causes each functional failure?
– List all failure modes reasonably likely to cause each functional failure.
• Decreasing Capability:
– Deterioration: fatigue, corrosion, wear and tear
– Lubrication Failures: root cause of many failures
– Contamination: causes excessive wear
– Human Errors: manual devices not properly operated
• Desired Performance exceeds Capability:– Intentional or Unintentional
– Sustained or Intermittent
17
Defining Failure Modes
• Failure modes should be defined in enough detail to select the most appropriate failure management policies.
– Failure modes should be identified in terms of ‘cause and effect’ if possible. • Bearing Failure: caused by manufacturing defect or lubrication loss – need to
address both separately.
– List failure modes with the highest probability of occurrence.• Start with failure modes that have occurred on similar equipment
• Use experience to estimate which failure modes are most likely to occur.
– List failure modes that result in the most severe consequences.• Engine coolant pump failure is more severe than the AC pump failure.
– Do not get bogged down with too many failure modes. • SAE Standard Stipulates: Only failure modes that are likely to occur in the
operating context.
• Example: What are the Pump Failure Modes for Total Loss of Flow
1. Pump Shaft Failure 2. Failure of Lubrication (foreign material in the fluid)
3. Pump Impellor Failure 4. Failure of the Seal
5. Pump Bearing Seizure (manufacturing defect)
18
Defining Failure
Effects 2.1.4
• What are the functional consequences when each failure mode occurs?– Effects: events that are a direct result of the failure
mode.• What evidence shows that the failure has occurred?
– Equipment stops rotating or alarm light.
• What safety or environmental threat exists?– Fire may occur or hazardous material may no longer be
contained.
• How has the mission, production or operation been effected?
– Another vehicle may need to remain with the failed vehicle.
• What damage is caused by the failure mode?– Bearing fault will cause impellor damage.
• What must be done to repair the equipment?– Engine needs to be removed to replace transmission: 8 hours
19
Defining Failure Effects
– Secondary consequences to the system as a result of the failure mode:
• Backup systems must be operated: is the operation, production or mission effected while running on backup?
• Are spares available? What is the delay time to receive replacement parts?
• Example: What is the Failure Effect for a PTO Pump Bearing Seizure
– Pump rotor is unable to continue to rotate, which causes the fluid to no longer flow.
– Vehicle incurs 5 hours downtime to replace pump.
– Another vehicle must be activated to complete the mission.
20
Example FMEA Format
• There are many FMEA formats. Choose one that fits your analysis needs best.
21
Criticality Analysis (Assessing Risk) and
Pareto Analysis
(Identifying ‘Dominant’ Failure Modes)
Useful for Prioritizing, Making
Decisions and Focusing the Analysis
Efforts
22
Criticality 2.1.5
• How likely is the failure mode to occur?
– The SAE standard stipulates that only failure modes that
are reasonably likely to occur in the context of operations
should be considered for the FMEA.
– Ideally this probability should be quantified in the RCM
analysis.
• What are the Consequences and is the Risk Tolerable?
– Risk is measured by multiplying the probability of failure
mode by the severity of the failure mode.
– Deciding what risk is tolerable is going to be individual
and organization specific.
Reference: SAE Standard JA1012
23
Critical and Dominant Failure
Modes
• Critical Failure Modeshave significant effects with high level safety, environmental, or mission consequences.
• Dominant Failure Modeis a single failure mode that accounts for a significant portion of the failure of a complex system
Motor
Rotor Stator
Shaft
Bearings
Lubrication
Coupling
Inner Race Ball or Roller
ElementsOuter
Race
Seals
24
Severity/Consequences
• Need to define the mission function, safety and environmental consequence in terms of severity levels to categorize each failure mode.
• Severity levels will be used to rank and prioritize each failure mode.
Reference: MIL-STD-882C, System Safety Program Requirements
25
Probability of Occurrence
• Need to qualify the probability that each failure mode will occur to rank and prioritize each failure mode.
• Fleet or Inventory: Normal total run hours to number of items.
Reference: MIL-STD-882C, System Safety Program Requirements
26
Pareto Analysis
• An effective method for
classifying and prioritizing
information.
• Failure data analysis helps to
identify the largest portion of the
reliability issues which then can
be addressed efficiently with the
most economical use of
resources.
• Failure data can consist of:
– Cost (operation or repair)
– Probability
– Frequency of Occurrence
– Consequence of failure
(severity)
In some cases a large number of
failure modes are the result of only a
few causes.
OPERATING EQUIPMENT ASSET MANAGEMENT YOUR 21ST CENTURY COMPETITIVE NECESSITY, By John S. Mitchell
Pump Failure Modes
27
Example FMECA Format
• Criticality Information: Frequency of Occurrence and the Severity.
• Including this information provides another parameter to aid decision making.
28
Criticality Analysis: For Ranking
and Prioritizing Failure Modes
• Criticality Analysis: provides a relative measure of significance of the effect that a failure mode has on the successful operation and safety of the system.
• Calculation is based on definition of failure, severity categories and part failure rate information.
• Two approaches:
– Quantitative: when historic
operational failure rate or test
failure rate data exists
– Qualitative: when little to no
failure rate data exists
Failure Mode, Effects and Criticality Analysis (FMECA), Reliability Analysis Center
29
where Cm = Criticality number for the “ mth “ failure mode
= Failure mode ratio (for a specific item)
= Conditional probability of loss of function
= part failure rate (failures/million hours)
t = operating time or number of operating cycles
Criticality Analysis: Quantitative
Approach
• Failure Mode Criticality Number: provides a
quantitative criticality rating for the component or
function.
p
tC pm
Practical Reliability Engineering, by Patrick D.T. O’Conner
35
Summary
• FMEA: An excellent systematic method for identifying and organizing the issues that lead to low operational reliability.
– Function of an Asset:• How the asset can fail to perform to the required standard.
– Failure Modes and Effects:• How do failures prevent an asset from functioning correctly and what happens
when specific failures occur.
• Criticality: A portion of the FMECA that provides additional information for decision making.
– Consequences and Probability of Failure:• How severe are the consequences of failure on the operation and mission and
what is the likelihood of occurrence.
– Pareto Analysis:• Methods for evaluating and prioritizing failure modes to determine the failure
modes that would benefit most from the implementation of the RCM process.