improving the risk matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · scenario 1: the...

29
Improving the Risk Matrix Nancy Leveson MIT

Upload: others

Post on 04-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

ImprovingtheRiskMatrixNancyLeveson

MIT

Page 2: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

AStandardVersionoftheRiskMatrix•  Usedthroughoutthelifecycle•  AssumesRisk=f(severity,likelihood)

Page 3: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Severity•  Definedasasetofcategories,suchasCatastrophic:mulFpledeaths

CriFcal:onedeathormulFplesevereinjuriesMarginal:onesevereinjuryormulFpleminorinjuriesNegligible:oneminorinjury

•  RelaFvelystraighIorwardbut–  Worstcase?Mostlikely?Credible?Predefinedcommonevents?–  Howdefinecredible?(blurswithlikelihood)–  Designbasis?(nuclearenergy)

•  ARP4761example:–  LossofdeceleraFoncapability

•  Notannunciatedduringtaxi:Major(Crewunabletostopa/cresulFnginslowspeedcontactwithterminal,aircraU,orvehicles)

•  Annunciatedduringtaxi:Nosafetyeffect(Crewsteersa/cclearofanyobstaclesandcallsforatugorportablestairs)

Page 4: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

"Improved"Disembarka=onMethod

Page 5: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve
Page 6: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Likelihood

•  Example:Frequent:likelytooccurfrequentlyProbable:WilloccurseveralFmesinthesystem’slifeOccasional:LikelytooccursomeFmeinthesystem’slifeRemote:Unlikelytooccurinsystem’slife,butpossibleImprobable:ExtremelyunlikelytooccurImpossible:Equaltoaprobabilityofzero

•  MoreproblemaFcthanseverity–  Historiceventsmaynotapply

•  System,environment,orwayusedmaychange•  SoUware“failure”isalways1

–  SomeFmesassociatewithprobabilitylevels(canthisbedetermined?)

Page 7: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

HowAccurateistheRiskMatrix?

•  AlmostnoscienFficevaluaFon–  TwostudiesIknowabout,bothhadpoorresults(ordersofmagnitudedifferentevaluaFonsbyexperts)

•  Empirical(frompracFcaluse)

•  GeneraltechnicallimitaFons–  MathemaFcalandtheoreFcal–  HeurisFcBiases

Page 8: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

EmpiricalEvalua=onsandPrac=calLimita=ons

Caveats•  NothingavailablesoonlyourownevaluaFonsonrealsystems

•  NotcriFcizingindividualengineersorcompanies–  TheywerefollowingstandardpracFces–  Ourgoalwastofigureouthowtoimprovewhatisdonetoday–  SameflawsinhundredsoftheseIhaveseeninmycareer

Page 9: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

EmpiricalEvalua=ons(2)

•  Commonproblem:Assessriskoffailuresnothazards–  LossofexternalcommunicaFonorbreakingpistonnutsvs.aircraUinstabilityorviolaFonofminseparaFonfromterrain

–  Reliability,notsafety

–  Whataboutnon-failures?

–  IndividualfailuresbutnotcombinaFonsoflow-rankedfailures(andusuallyassumpFonsthatpilotwillbehaveappropriately)

•  InfeasibletoconsiderallcombinaFons•  AssumpFonofindependence•  Affectsaccuracyofresults

Page 10: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

EmpiricalEvalua=ons(3)

•  AssumpFonsaboutcorrectpilotreacFontofailures(thenblamethemfortheaccidents)–  PilotmentalmodeliscriFcal.Whereisthisintheriskassessment?

•  UnrealisFcassumpFonsabouthardwareandsoUware–  RedundancyasamiFgaFon:

•  Doesn’tworkforsoUwareorfordesignerrorsinhardware•  SoUwareONLYhasdesignerrors

–  VirtuallyallsoUware-relatedaccidentsstemfromrequirementserrors,notimplementaFonerrors•  RedundancyandrigorofsoUwaredevelopmentwillnothelphere

Page 11: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

EmpiricalEvalua=ons(4)•  Wefounditemscategorizedas

Severity=CatastrophicLikelihood=LowthathadbeeninvolvedinmulFpleaccidentsforthosesystems

•  OnlyimprobableifignoresoUwarerequirementsflaws,humanbehavioraspects,etc.

•  STPAfoundnon-failurescenariosleadingtocatastrophiceventsthatwereomigedfromofficialriskassessment

•  STPAidenFfiedrealisFcandrelaFvelylikelyscenariosleadingtoallofspecificfailuresdismissedasimprobableinofficialriskassessment.

•  LikelihoodcandiffersignificantlydependingonexternalenvironmentandoperaFonsinwhichafailureoccurs.

Page 12: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

TechnicalLimita=ons

•  TheuseoftheriskmatrixitselfhasbeenshowntohavemathemaFcalandotherlimitaFons(seepaper)

•  MostimportantstemfromHeurisFcBiases(Kahnemann,Tversky,Slovic)–  PsychologistswhostudiedhowpeopleactuallydoriskevaluaFons

–  Humans,itturnsout,areterribleatesFmaFngrisk

Page 13: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Heuris=cBiases(Tversky,Slovic,andKahneman)

•  ConfirmaFonbias(lookfordatathatsupportsourbeliefs)

•  Constructsimplecausalscenarios–  Ifnonecomestomind,assumeimpossible

•  TendtoidenFfysimple,dramaFceventsratherthaneventsthatarechronicorcumulaFve

•  Incompletesearchforcauses–  OnceonecauseidenFfiedandnotcompelling,thenstopsearch

•  Defensiveavoidance–  Downgradeaccuracyordon’ttakeseriously–  Avoidtopicthatisstressfulorconflictswithothergoals

Page 14: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Heuris=cBiases

Canavoidby:ProvidingthoseresponsiblewithbegerinformaFon,obtainedthroughastructuredprocesstogeneratescenarios.

Thatgoalbeaccomplishedusingmorepowerfulhazardanalysistechniques,suchasSTPA

Page 15: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Poten=alAlterna=vestotheRiskMatrix

1.  Usehazards(notfailures)andbegerinformaFonaboutpotenFalcausalscenarios

2.  ChangebasicdefiniFonofriskandhowitisassessed(notcoveredinthistalk)

Page 16: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

UseHazardRatherthanFailures

•  RelaFonshipbetweenindividualfailuresandlossesisnotobvious.

–  AssessinghazardsisamoredirectpathtoulFmategoal

–  Componentreliabilityisnotequivalenttosystemsafety

–  UsinghazardsistradiFonalinsystemsafety

Page 17: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example:WhyShouldUseHazards

•  HelicopterDeiceFuncFon•  FinalSARincludedafailureofAPUresulFngfromchaffing.

–  ImportantbecauseAPUusedwhenlossofonegeneratoroccursduringbladedeicing

–  ButalsoanotherscenarioidenFfiedbyusingSTPAthatcouldoccurwhenAPUhasnotfailed

UCA:TheflightcrewdoesnotswitchtheAPU(AuxiliaryPowerUnit)generatorpowerONwheneitherGEN1orGEN2arenotsupplyingpowertothehelicopterandthebladede-icesystemisrequiredtopreventicing.

–  Severalcausalscenariosandfactors,buttheyarenotinofficialSAR

–  Needtobefactoredintoanyriskassessment

Page 18: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

ChangeBeingRecommended

•  StartfromaprioriFzedlistofstakeholderidenFfiedaccidentsorsystemlosses.

•  IdenFfyhigh-levelsystemhazardsleadingtotheselosses

•  Assessseverityandlikelihoodofhazards

•  Onlyconsiderfailuresthatcanleadtohazards(idenFfiedbySTPA)alongwiththenon-failurescenarios(again,STPAcanidenFfythem)

•  ConsistentwithMIL-STD-882andmostothersafetystandards

Page 19: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

LikelihoodasStrengthofPoten=alControls

•  Severitynoweasybecausecanbetraceddirectlytolistofaccidentsormishaps

•  HeurisFcbiasesleadtopooresFmatesoflikelihood

•  FollowingarigorousSTPAwillresultin–  Reducingshortcutsandbiases–  MorefullconsideraFonofpotenFalcausalscenarios

•  CanbedoneearlyindevelopmenttoidenFfywheretoplacedevelopmenteffort

•  MaybefocusoncomponentbehaviorbecausehavehistoricalfailureinformaFon

Page 20: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example1:Pilot’suseofflightcontrols•  UCA:TheFlightCrewdoesnotdeflectpedalssufficientlytocountertorque

fromthemainrotor,resulLngintheFlightCrewlosingcontroloftheaircraMandcomingintocontactwithanobstacleintheenvironmentortheterrain.

Oneofcausalscenarios:•  Scenario1:TheFlightCrewisunawarethatthepedalshavenotbeendeflected

sufficientlytocounterthetorquefromthemainrotor.•  TheFlightCrewcouldhavethisflawedprocessmodelbecause:

–  a)TheflightinstrumentsaremalfuncLoningandprovidingincorrectorinsufficientfeedbacktothecrewabouttheaircraMstateduringdegradedvisualcondiLons.

–  b)TheflightinstrumentsareoperaLngasintended,butprovidinginsufficientfeedbacktothecrewtoapplytheproperpedalinputstocontrolheadingoftheaircraMtoavoidobstaclesduringdegradedvisualcondiLons.

–  c)TheFlightCrewhasanincorrectmentalmodelofhowtheFCSwillexecutetheircontrolinputstocontroltheaircraMandhowtheenginewillrespondtotheenvironmentalcondiLons.

–  d)TheFlightCrewisconfusedaboutthecurrentmodeoftheaircraMautomaLonandisthusunawareoftheactualcontrollawsthataregoverningtheaircraMatthisLme.

–  e)Thereisincorrectorinsufficientcontrolfeedback.

Page 21: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example1:Pilot’suseofflightcontrols(Con’t)

•  Eachcausalfactorusedtogeneraterequirementsanddesignfeaturestoreducetheirlikelihoodofoccurring

•  LikelihoodcanbebasedonstrengthofpotenLalcontrols–  Interfacedesign(evaluatedbyhumanfactorsexpert)–  Redundancyandfaulttolerantdesign–  Training–  Systemdesign(hardware,soUware,interacFons)–  Designoffeedback

•  SFllneedawaytolinkthesetolikelihood(willcomebacktothat)

Page 22: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example2:SoZware

•  Whatdonow---rigorofdevelopment---makesnosensetechnically

UCA:OneormoreoftheFCCs(flightcontrolcomputers)commandcollecLveinputtothehydraulicservostoolong,resulLnginanundesirablerotorRPMcondiLonandpotenLallyleadingtothehazardofviolaLngminimumseparaLonfromterrainorthehazardoflosingcontroloftheaircraM.

•  Atleast5causalscenarioswhytheFCCsmightdothis

Page 23: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example(2):SoZwareScenario1:TheFCCsareunawarethatthedesiredstatehasbeenachievedandconFnuetosupplycollecFveinput.a)TheFCCsarenotreceivingaccurateposiFonfeedbackfromthemainrotorservos.b)TheFCCsarenotreceivinginputfromtheICUstostopsupplyingswashplateinput.Scenario2:TheFCCsdonotsendtheappropriateresponsetotheaircraUforparFcularcontrolinputs.Thiscouldhappenif:

a)ThecontrollogicdoesnotfollowintuiFveguidelinesthathavebeenimplementedinearlieraircraU,perhapsbecauserequirementstodosowerenotincludedinthesoUwarerequirementsspecificaFon.b)ThehardwareonwhichtheFCCsareimplementedhasfailedorisoperaFnginadegradedstate.

Scenario3:TheFCCsdonotprovidefeedbacktothepilotstostopcommandingcollecFveincreasewhenneededbecausetheFADEC(enginecontroller)issupplyingincorrectcuestotheFCCsregardingenginecondiFons.Scenario4:TheFCCsdonotprovidefeedbacktothepilotstostopcommandingcollecFveincreasewhenneededbecausetheFCCsarereceivinginaccurateNR(rotorrpm)sensorinformaFonfromthemainrotor.Scenario5:TheFCCsprovideincorrecttacFlecueingtotheICUs(inceptorcontrolunits)toproperlyplacethecollecFvetopreventlowrotorRPMcondiFons.

Page 24: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Example2:SoZware(con’t)

•  ScenariosusedtoidenFfyappropriateFCCrequirementsanddesignconstraints.

•  Forexample,forScenario1:–  1.TheFCCsmustperformmediantesLngtodetermineiffeedbackreceivedfromthemainrotorservosisinaccurate.

–  2.ThePRSVOFAULTcauLonmustbepresentedtotheFlightCrewiftheFCCslosecommunicaLonwithamainrotorservo.

–  3.TheEICASmustalerttheFlightCrewiftheFCCsdonotgetinputfromtheICUeveryxseconds.

•  Translatetheseinto“likelihood”(finalpieceofpuzzle)

Page 25: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Transla=ngStrengthofControlsintoLikelihood

QualitaFveRankingsuchas1.  Thecausalfactorcanbeeliminatedthroughdesignandhigh

assurance.2.  Theoccurrenceofthecausalfactorcanbereducedor

controlledthroughsystemdesign3.  ThecausalfactorcanbedetectedandmiFgatedifitdoes

occurthroughsystemdesignorthroughoperaFonalprocedures

4.  TheonlypotenFalcontrolsinvolvetrainingandprocedures.

MaybetoosimplisFc?–  Couldincludehowthoroughlythecausalfactorhasbeenhandledwithineachcategory

–  CombinaFonsofpossiblecontrols?

Page 26: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Transla=ngStrengthofControlsintoLikelihood(2)

•  MaybeabletocomeupwithmoresophisFcatedproceduresforspecifictypesofsystems.

•  Examplesinpaperonthistopicat:hgp://sunnyday.mit.edu/Risk-Matrix.pdf

ArchitecturaltradestudyforspaceexploraFonAirTrafficControlenhancements

Page 27: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Addi=onalConsidera=ons

•  RiskalsoaffectedbyfactorsduringmanufacturingandoperaFons:

–  Manufacturingcontrols

–  Designedmaintainabilityandmaintenanceerrors

–  Trainingprograms

–  ChangesoverFmeinusageenvironment

–  Consistencyandrigorofmanagementandoversight

–  AssumpFonsduringdevelopmentaboutoperaFonalenvironment:howwellcommunicatedtousersandhowrigorouslyareenforcedduringoperaFons

–  etc.

Page 28: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Addi=onalConsidera=ons(2)

•  Includingthesefactorswillimproveriskassessment

•  ShouldalsotrackfactorsandimproveriskassessmentoverFme

–  Riskassessmentprocessneednotstopatdeployment

–  Risk-baseddecisionsneededthroughoutlifecycled–  CasFlho:AcFveSTPA

•  IdenFfyleadingindicatorsofincreasingriskduringoperaFons

Page 29: Improving the Risk Matrixpsas.scripts.mit.edu/home/wp-content/uploads/2019/... · Scenario 1: The FCCs are unaware that the desired state has been achieved and conFnue to supply collecFve

Conclusions

•  Canprovideimprovedriskmatrixprocesses•  Startfromhazards,notfailures,togetmorerealisFcassessmentsofrisk

•  STPAandbegercausalanalysiscangreatlyimprovelikelihoodesFmates

•  SuggesFonswereprovidedandotherpeopleshouldbeabletocreateevenbegerprocesses

•  ButlimitedbytheuseoftheRiskMatrixandcurrentdefiniFonofrisk–  AlternaFveistoimprovedefiniFonofriskanditsevaluaFon–  SuggesFonsforthisgoalwillfollow(soon)