improving the risk matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · scenario 1: the...
TRANSCRIPT
ImprovingtheRiskMatrixNancyLeveson
MIT
AStandardVersionoftheRiskMatrix• Usedthroughoutthelifecycle• AssumesRisk=f(severity,likelihood)
Severity• Definedasasetofcategories,suchasCatastrophic:mulFpledeaths
CriFcal:onedeathormulFplesevereinjuriesMarginal:onesevereinjuryormulFpleminorinjuriesNegligible:oneminorinjury
• RelaFvelystraighIorwardbut– Worstcase?Mostlikely?Credible?Predefinedcommonevents?– Howdefinecredible?(blurswithlikelihood)– Designbasis?(nuclearenergy)
• ARP4761example:– LossofdeceleraFoncapability
• Notannunciatedduringtaxi:Major(Crewunabletostopa/cresulFnginslowspeedcontactwithterminal,aircraU,orvehicles)
• Annunciatedduringtaxi:Nosafetyeffect(Crewsteersa/cclearofanyobstaclesandcallsforatugorportablestairs)
"Improved"Disembarka=onMethod
Likelihood
• Example:Frequent:likelytooccurfrequentlyProbable:WilloccurseveralFmesinthesystem’slifeOccasional:LikelytooccursomeFmeinthesystem’slifeRemote:Unlikelytooccurinsystem’slife,butpossibleImprobable:ExtremelyunlikelytooccurImpossible:Equaltoaprobabilityofzero
• MoreproblemaFcthanseverity– Historiceventsmaynotapply
• System,environment,orwayusedmaychange• SoUware“failure”isalways1
– SomeFmesassociatewithprobabilitylevels(canthisbedetermined?)
HowAccurateistheRiskMatrix?
• AlmostnoscienFficevaluaFon– TwostudiesIknowabout,bothhadpoorresults(ordersofmagnitudedifferentevaluaFonsbyexperts)
• Empirical(frompracFcaluse)
• GeneraltechnicallimitaFons– MathemaFcalandtheoreFcal– HeurisFcBiases
EmpiricalEvalua=onsandPrac=calLimita=ons
Caveats• NothingavailablesoonlyourownevaluaFonsonrealsystems
• NotcriFcizingindividualengineersorcompanies– TheywerefollowingstandardpracFces– Ourgoalwastofigureouthowtoimprovewhatisdonetoday– SameflawsinhundredsoftheseIhaveseeninmycareer
EmpiricalEvalua=ons(2)
• Commonproblem:Assessriskoffailuresnothazards– LossofexternalcommunicaFonorbreakingpistonnutsvs.aircraUinstabilityorviolaFonofminseparaFonfromterrain
– Reliability,notsafety
– Whataboutnon-failures?
– IndividualfailuresbutnotcombinaFonsoflow-rankedfailures(andusuallyassumpFonsthatpilotwillbehaveappropriately)
• InfeasibletoconsiderallcombinaFons• AssumpFonofindependence• Affectsaccuracyofresults
EmpiricalEvalua=ons(3)
• AssumpFonsaboutcorrectpilotreacFontofailures(thenblamethemfortheaccidents)– PilotmentalmodeliscriFcal.Whereisthisintheriskassessment?
• UnrealisFcassumpFonsabouthardwareandsoUware– RedundancyasamiFgaFon:
• Doesn’tworkforsoUwareorfordesignerrorsinhardware• SoUwareONLYhasdesignerrors
– VirtuallyallsoUware-relatedaccidentsstemfromrequirementserrors,notimplementaFonerrors• RedundancyandrigorofsoUwaredevelopmentwillnothelphere
EmpiricalEvalua=ons(4)• Wefounditemscategorizedas
Severity=CatastrophicLikelihood=LowthathadbeeninvolvedinmulFpleaccidentsforthosesystems
• OnlyimprobableifignoresoUwarerequirementsflaws,humanbehavioraspects,etc.
• STPAfoundnon-failurescenariosleadingtocatastrophiceventsthatwereomigedfromofficialriskassessment
• STPAidenFfiedrealisFcandrelaFvelylikelyscenariosleadingtoallofspecificfailuresdismissedasimprobableinofficialriskassessment.
• LikelihoodcandiffersignificantlydependingonexternalenvironmentandoperaFonsinwhichafailureoccurs.
TechnicalLimita=ons
• TheuseoftheriskmatrixitselfhasbeenshowntohavemathemaFcalandotherlimitaFons(seepaper)
• MostimportantstemfromHeurisFcBiases(Kahnemann,Tversky,Slovic)– PsychologistswhostudiedhowpeopleactuallydoriskevaluaFons
– Humans,itturnsout,areterribleatesFmaFngrisk
Heuris=cBiases(Tversky,Slovic,andKahneman)
• ConfirmaFonbias(lookfordatathatsupportsourbeliefs)
• Constructsimplecausalscenarios– Ifnonecomestomind,assumeimpossible
• TendtoidenFfysimple,dramaFceventsratherthaneventsthatarechronicorcumulaFve
• Incompletesearchforcauses– OnceonecauseidenFfiedandnotcompelling,thenstopsearch
• Defensiveavoidance– Downgradeaccuracyordon’ttakeseriously– Avoidtopicthatisstressfulorconflictswithothergoals
Heuris=cBiases
Canavoidby:ProvidingthoseresponsiblewithbegerinformaFon,obtainedthroughastructuredprocesstogeneratescenarios.
Thatgoalbeaccomplishedusingmorepowerfulhazardanalysistechniques,suchasSTPA
Poten=alAlterna=vestotheRiskMatrix
1. Usehazards(notfailures)andbegerinformaFonaboutpotenFalcausalscenarios
2. ChangebasicdefiniFonofriskandhowitisassessed(notcoveredinthistalk)
UseHazardRatherthanFailures
• RelaFonshipbetweenindividualfailuresandlossesisnotobvious.
– AssessinghazardsisamoredirectpathtoulFmategoal
– Componentreliabilityisnotequivalenttosystemsafety
– UsinghazardsistradiFonalinsystemsafety
Example:WhyShouldUseHazards
• HelicopterDeiceFuncFon• FinalSARincludedafailureofAPUresulFngfromchaffing.
– ImportantbecauseAPUusedwhenlossofonegeneratoroccursduringbladedeicing
– ButalsoanotherscenarioidenFfiedbyusingSTPAthatcouldoccurwhenAPUhasnotfailed
UCA:TheflightcrewdoesnotswitchtheAPU(AuxiliaryPowerUnit)generatorpowerONwheneitherGEN1orGEN2arenotsupplyingpowertothehelicopterandthebladede-icesystemisrequiredtopreventicing.
– Severalcausalscenariosandfactors,buttheyarenotinofficialSAR
– Needtobefactoredintoanyriskassessment
ChangeBeingRecommended
• StartfromaprioriFzedlistofstakeholderidenFfiedaccidentsorsystemlosses.
• IdenFfyhigh-levelsystemhazardsleadingtotheselosses
• Assessseverityandlikelihoodofhazards
• Onlyconsiderfailuresthatcanleadtohazards(idenFfiedbySTPA)alongwiththenon-failurescenarios(again,STPAcanidenFfythem)
• ConsistentwithMIL-STD-882andmostothersafetystandards
LikelihoodasStrengthofPoten=alControls
• Severitynoweasybecausecanbetraceddirectlytolistofaccidentsormishaps
• HeurisFcbiasesleadtopooresFmatesoflikelihood
• FollowingarigorousSTPAwillresultin– Reducingshortcutsandbiases– MorefullconsideraFonofpotenFalcausalscenarios
• CanbedoneearlyindevelopmenttoidenFfywheretoplacedevelopmenteffort
• MaybefocusoncomponentbehaviorbecausehavehistoricalfailureinformaFon
Example1:Pilot’suseofflightcontrols• UCA:TheFlightCrewdoesnotdeflectpedalssufficientlytocountertorque
fromthemainrotor,resulLngintheFlightCrewlosingcontroloftheaircraMandcomingintocontactwithanobstacleintheenvironmentortheterrain.
Oneofcausalscenarios:• Scenario1:TheFlightCrewisunawarethatthepedalshavenotbeendeflected
sufficientlytocounterthetorquefromthemainrotor.• TheFlightCrewcouldhavethisflawedprocessmodelbecause:
– a)TheflightinstrumentsaremalfuncLoningandprovidingincorrectorinsufficientfeedbacktothecrewabouttheaircraMstateduringdegradedvisualcondiLons.
– b)TheflightinstrumentsareoperaLngasintended,butprovidinginsufficientfeedbacktothecrewtoapplytheproperpedalinputstocontrolheadingoftheaircraMtoavoidobstaclesduringdegradedvisualcondiLons.
– c)TheFlightCrewhasanincorrectmentalmodelofhowtheFCSwillexecutetheircontrolinputstocontroltheaircraMandhowtheenginewillrespondtotheenvironmentalcondiLons.
– d)TheFlightCrewisconfusedaboutthecurrentmodeoftheaircraMautomaLonandisthusunawareoftheactualcontrollawsthataregoverningtheaircraMatthisLme.
– e)Thereisincorrectorinsufficientcontrolfeedback.
Example1:Pilot’suseofflightcontrols(Con’t)
• Eachcausalfactorusedtogeneraterequirementsanddesignfeaturestoreducetheirlikelihoodofoccurring
• LikelihoodcanbebasedonstrengthofpotenLalcontrols– Interfacedesign(evaluatedbyhumanfactorsexpert)– Redundancyandfaulttolerantdesign– Training– Systemdesign(hardware,soUware,interacFons)– Designoffeedback
• SFllneedawaytolinkthesetolikelihood(willcomebacktothat)
Example2:SoZware
• Whatdonow---rigorofdevelopment---makesnosensetechnically
UCA:OneormoreoftheFCCs(flightcontrolcomputers)commandcollecLveinputtothehydraulicservostoolong,resulLnginanundesirablerotorRPMcondiLonandpotenLallyleadingtothehazardofviolaLngminimumseparaLonfromterrainorthehazardoflosingcontroloftheaircraM.
• Atleast5causalscenarioswhytheFCCsmightdothis
Example(2):SoZwareScenario1:TheFCCsareunawarethatthedesiredstatehasbeenachievedandconFnuetosupplycollecFveinput.a)TheFCCsarenotreceivingaccurateposiFonfeedbackfromthemainrotorservos.b)TheFCCsarenotreceivinginputfromtheICUstostopsupplyingswashplateinput.Scenario2:TheFCCsdonotsendtheappropriateresponsetotheaircraUforparFcularcontrolinputs.Thiscouldhappenif:
a)ThecontrollogicdoesnotfollowintuiFveguidelinesthathavebeenimplementedinearlieraircraU,perhapsbecauserequirementstodosowerenotincludedinthesoUwarerequirementsspecificaFon.b)ThehardwareonwhichtheFCCsareimplementedhasfailedorisoperaFnginadegradedstate.
Scenario3:TheFCCsdonotprovidefeedbacktothepilotstostopcommandingcollecFveincreasewhenneededbecausetheFADEC(enginecontroller)issupplyingincorrectcuestotheFCCsregardingenginecondiFons.Scenario4:TheFCCsdonotprovidefeedbacktothepilotstostopcommandingcollecFveincreasewhenneededbecausetheFCCsarereceivinginaccurateNR(rotorrpm)sensorinformaFonfromthemainrotor.Scenario5:TheFCCsprovideincorrecttacFlecueingtotheICUs(inceptorcontrolunits)toproperlyplacethecollecFvetopreventlowrotorRPMcondiFons.
Example2:SoZware(con’t)
• ScenariosusedtoidenFfyappropriateFCCrequirementsanddesignconstraints.
• Forexample,forScenario1:– 1.TheFCCsmustperformmediantesLngtodetermineiffeedbackreceivedfromthemainrotorservosisinaccurate.
– 2.ThePRSVOFAULTcauLonmustbepresentedtotheFlightCrewiftheFCCslosecommunicaLonwithamainrotorservo.
– 3.TheEICASmustalerttheFlightCrewiftheFCCsdonotgetinputfromtheICUeveryxseconds.
• Translatetheseinto“likelihood”(finalpieceofpuzzle)
Transla=ngStrengthofControlsintoLikelihood
QualitaFveRankingsuchas1. Thecausalfactorcanbeeliminatedthroughdesignandhigh
assurance.2. Theoccurrenceofthecausalfactorcanbereducedor
controlledthroughsystemdesign3. ThecausalfactorcanbedetectedandmiFgatedifitdoes
occurthroughsystemdesignorthroughoperaFonalprocedures
4. TheonlypotenFalcontrolsinvolvetrainingandprocedures.
MaybetoosimplisFc?– Couldincludehowthoroughlythecausalfactorhasbeenhandledwithineachcategory
– CombinaFonsofpossiblecontrols?
Transla=ngStrengthofControlsintoLikelihood(2)
• MaybeabletocomeupwithmoresophisFcatedproceduresforspecifictypesofsystems.
• Examplesinpaperonthistopicat:hgp://sunnyday.mit.edu/Risk-Matrix.pdf
ArchitecturaltradestudyforspaceexploraFonAirTrafficControlenhancements
Addi=onalConsidera=ons
• RiskalsoaffectedbyfactorsduringmanufacturingandoperaFons:
– Manufacturingcontrols
– Designedmaintainabilityandmaintenanceerrors
– Trainingprograms
– ChangesoverFmeinusageenvironment
– Consistencyandrigorofmanagementandoversight
– AssumpFonsduringdevelopmentaboutoperaFonalenvironment:howwellcommunicatedtousersandhowrigorouslyareenforcedduringoperaFons
– etc.
Addi=onalConsidera=ons(2)
• Includingthesefactorswillimproveriskassessment
• ShouldalsotrackfactorsandimproveriskassessmentoverFme
– Riskassessmentprocessneednotstopatdeployment
– Risk-baseddecisionsneededthroughoutlifecycled– CasFlho:AcFveSTPA
• IdenFfyleadingindicatorsofincreasingriskduringoperaFons
Conclusions
• Canprovideimprovedriskmatrixprocesses• Startfromhazards,notfailures,togetmorerealisFcassessmentsofrisk
• STPAandbegercausalanalysiscangreatlyimprovelikelihoodesFmates
• SuggesFonswereprovidedandotherpeopleshouldbeabletocreateevenbegerprocesses
• ButlimitedbytheuseoftheRiskMatrixandcurrentdefiniFonofrisk– AlternaFveistoimprovedefiniFonofriskanditsevaluaFon– SuggesFonsforthisgoalwillfollow(soon)