assessing the impact of imperfect diagnosis on...

17
Assessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious Model Approach Networking and Security Group Aalborg University, Denmark [email protected] European Dependable Computing Conference 2010 – Valencia, Spain April 28, 2010 < (Presenter) Jesper Grønbæk Hans-Peter Schwefel Jens Kristian Kjærgård Thomas S. Toftegaard Tieto IP Solutions, Denmark Aarhus School of Engineering, University of Aarhus, Denmark Forschungszentrum Telekommunikation Wien, Austria

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

AssessingtheImpactofImperfectDiagnosisonServiceReliability:AParsimoniousModelApproach

Networking and Security Group Aalborg University, Denmark [email protected]

European Dependable Computing Conference 2010 – Valencia, Spain April 28, 2010

<

(Presenter) Jesper Grønbæk Hans-Peter Schwefel Jens Kristian Kjærgård Thomas S. Toftegaard

Tieto IP Solutions, Denmark

Aarhus School of Engineering, University of Aarhus, Denmark

Forschungszentrum Telekommunikation Wien, Austria

Page 2: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

2

• ConclusionsImperfectDiagnosis

  Networkfaultdiagnosis  Dependableend‐userserviceprovisioninginNextGenerationNetworkarchitectures

Dominatedbywirelessnetworks,mobilityandvaryingtrafficconditions  Challengedbyunreliableobservationsandhiddennetworkstates

  ImperfectDiagnosis

  Modellingimperfectdiagnosis  Goalsofmodelling

A.  DeterminebestremediationactionsB.  Determinebesttrade‐offofimperfections

  Assesspropertiesofagivendiagnosiscomponent(functionlevelmodelling[1],systemlevelsimulation[2])

  Light‐weightmodelsdesirableforfrequentmodelre‐evaluations

BackgroundandMotivation

Page 3: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

3

• ConclusionsImperfectDiagnosis

  ODDRdecentralizedfaultmanagementframework[3][4](Observation,Diagnosis,DecisionandRemediation)  End‐nodeDrivenFaultManagement  Jointviewonimperfectdiagnosisanddecisions(remediation,observationcollection)  Operationindynamicenvironmentfrequentmodelre‐evaluations

Subsequentfocusontrade‐offofimperfections(bestdiagnosissettings)

Example:DecentalizedFaultManagementFramework

Page 4: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

  Diagnosisatomicview  Singleobservation  Twonetworkstates(Normal/Fault)  Discretediagnosissteps(periodT)

  GenericDiagnosis(stateestimation)definitions

4

• ConclusionsBackgroundonDiagnosisApproachesDefinitionsofDiagnosisOutcomes

Page 5: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

5

• ConclusionsBackgroundonDiagnosisApproachesDiagnosisClasses

1 Terminology adapted from [5]

2000 repetitions

  Twolevelsofcomplexityofdiagnosisbehaviour  One‐shot1:diagnosisestimatebasedonasinglesetofobservationsintime

  NocorrelationofdiagnosisestimatesfromdiagnosisSimplemodelrepresentationproposedin[3]

  Over‐time1:diagnosisestimatebasedonnewandoldobservations  Meanstoimprovediagnosisestimates  Strongcorrelationaddedbydiagnosiscomponent

  Comparison  One‐shot:thresholdonround‐triptime(RTT)  Over‐time:α‐countheuristic(Bondavallietal.[1])onone‐shotestimates  Transienteffectsfromnetworkneglected

  Over‐timehashighlytransientphase;yetsignificantimprovement  Identifybesttrade‐off:ReactionTime&FalseAlarms  Simpleparameterizationfromsteady‐statebehaviourisdifficult

Page 6: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

  Four‐stateMarkovmodelpresentedin[3]  ControlledbygeometricON‐OFFnetworkstateprocess

(fault/repairoccurence){pf,pr}  2freeparameters{P(TN|Ns=Normal)=TNR=(1‐FPR),P(TP|Ns=Fault)=TPR=(1‐FNR)}

  Exploremodelcapabilities  Remediationassumption:fail‐overonnetworkfaultstatediagnosis  6freeparameters  fixed{pf,pr}4freeparameters

6

• ConclusionsParsimoniousDiagnosisModelDefinitionandParameters

SystemEquations

Page 7: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

7

• ConclusionsParsimoniousDiagnosisModel

  DiagnosisMetrics  ProposedMetrics(steadystate)

  ProbabilityonRemediationonFalseAlarm,(pRFA)  MeanRemediationReactionTime(µRRT)

Note,twoparametersandfourfree

  DiagnosisTrace  Startdiagnosisinnormalnetworkstateforagivenset{pf,pr}  Observeuntilalarmisdiagnosed  PerformMrepetitionsandderiveO=#FA

  pRFA=O/M  µRRT,meantimetoremediationoverallM

DiagnosisMetricsDefinitions

Page 8: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

8

• ConclusionsParsimoniousDiagnosisModel

  Closed‐formequationsderivedbylinearalgebraicapproaches[6]  ProbabilityonRemediationonFalseAlarm(pRFA)Probabilityofabsorption

  MeanRemediationReactionTime(µRRT)Meantimetoabsorption

  Solvingyieldstwolinearequations:

DiagnosisMetricsEquations

Absorbing states

Initial state

Page 9: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

  Underdeterminedproblemsolvedbyheuristics(MI)MinimizepFPTNandpTPFN.MinimizedirecttransitionsTNFPandFNTP

  Behaviourintransientanalysis:  Initialstudyparameters:T=0.4s,Meannormalperiod=12.42s,Meanfaultperiod=15s

  CapturesaninitialhigherprobabilityofpRTAoverallalarms(pRTA+pRFA)

9

• ConclusionsParameterizationbyDiagnosisMetrics

minimize

minimize

pRFA

pRTA

pRTA

(pRFA + pRTA)

Page 10: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

10

• ConclusionsCase:TimeConstrainedDataTransfer

  QoSrequirement:CompleteSCTPbasedfiletransferwithintdeadlinesecondswiththeprobability:Ω

  Fault:Congestioninoperatorinfrastructure(occurrenceandrepair,ON‐OFFmodel)

  Remediation:Singlefail‐overfromnetworkAtonetworkB

  Diagnosis:SimplethresholdbasedonRTTandα‐count  Decision:Fail‐overonnetworkfaultstatediagnosis

Background

Page 11: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

11

• ConclusionsCase:TimeConstrainedDataTransfer

  PolicyEvaluationDiscreteTimeMarkovModel(PEDTMC)[3]  StateSpace:

SPE={Activenetwork,Timeprogress,Fileprogress,Networkstate,Diagnosisstate}

  Ωmodel=ΣSPEss(r,n)

PolicyEvaluationModel

File Transfer Completion Time CDF

r =1

m

Page 12: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

12

• ConclusionsModelSensitivityAnalysis

  ModelbasedsensitivityanalysisonΩ  VaryµRTTandpRFA,tdeadline=30s&filesize=10MByte

  Comparetoperfectdiagnosisandno‐failoverpolicy

  BothmetricshaveaclearimpactonΩ,µRTTpromptnessandpRFA‐>correctness  MostsensitivetohighpRFAwrongfail‐overcannotberemediated  Candeliversignificantlyworseperformancethannofail‐over

Perfect Diagnosis

No fail-over

Page 13: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

13

• ConclusionsReliabilityEvaluationResults

  Studypropertiesofα‐countdiagnosiscomponent  α‐countcontrolledbytwoparameters:kforgettingfactor,αTthreshold  PEDTMCModelbasedanalysis  Simulationbasedanalysis

  Systemlevelsimulationbasedonns‐2  ProvideevaluationofΩandtracesofdiagnosisperformance

  Considertwosettingsofone‐shotdiagnosis:

  Tradeoffoptionsofa‐count(obtainedfromsingletraceset,2000runs)

Background&Trade‐offResults

γ0=(TPR,TNR)=(0.983,0.097)γ1=(TPR,TNR)=(0.953,0.225)

Page 14: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

14

• ConclusionsReliabilityEvaluationResults

  PEDTMCmodelbasedanalysis  Simplethreshold

  γ0performsbetterthanγ1(asshownin[3])

  α‐count  Overallleadstoimprovement

filteringoutfalsealarms  Optimalsettingsexist  γ1:k=0.92,aT=2.5leadstobestresults

ObtainablereductionofpRFAwithoutsimilarincreaseinµRTT

  Simulationbasedanalysis  Consistentconclusionstomodel  Qualitativedifferences

  stochastictimemodel

  Simplifieddata‐transfermodel

Background&Trade‐offResults

Ωsi

mul

atio

n Ω

mod

el

Threshold αT

Simple threshold Threshold αT

Page 15: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

15

• ConclusionsConclusion&Outlook

  Conclusions  Proposedparsimoniousimperfectdiagnosismodelforlight‐weightassessmentof

bestdiagnosiscomponentsettings;alsoconsideringcomplexclassofover‐timediagnosiscomponents

  Definedrepresentativeimperfectdiagnosisperformancemetricsandderivedtheirclosed‐formequationsinthemodel

  Presentedservicereliabilitycaseandperformedmodelbasedsensitivityanalysisofreliabilityonimperfectdiagnosisperformancemetrics

  Usedmodeltoassessdiagnosisperformancepropertiesofover‐timediagnosisheuristicfromliteratureanddefinebestsetting

  Shownbysystemlevelsimulationanalysisthatdiagnosismodelcancaptureessentialimperfectdiagnosisperformancecharacteristics

  Outlook  Introducemorecomplexdecisionpolicies

  Applicationstateinformationminimizeremediation  Multiplefaultdiagnosis  DecisionstocollectmoreinformationNeedtostudydiagnosismodelbehaviourafterpositivediagnosisandpotentiallyextend

Page 16: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain DRCN 09 - Washington DC

16

• Conclusions

Questions&Discussion

Page 17: Assessing the Impact of Imperfect Diagnosis on …people.rennes.inria.fr/.../2010/04/edcc_2010_groenbaek.pdfAssessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious

April 28, 2010 EDCC 2010 – Valencia, Spain

17

References

[1] Threshold-based mechanisms to discriminate transient from intermittent faults. A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, and F. Grandoni, IEEE Transactions on Computers, vol. 49, no. 3, pp. 230–245, 2000.

[2] Probabilistic Fault-Diagnosis in Mobile Networks Using Cross-Layer Observations. A. Nickelsen, J. Grønbæk, T. Renier, and H.-P. Schwefel, “” In Proceedings of AINA 09, pp. 225–232, 2009.

[3] Model based evaluation of policies for end-node driven fault recovery. J. Grønbæk, H.-P. Schwefel, and T. Toftegaard, Proc. DRCN 09, 2009.

[4] Towards self-adaptive reliable network services in highly-uncertain environments. A. Ceccarelli, J. Grønbæk, L. Montecchi, A. Bondavalli, and H. P. Schwefel, To appear in proceedings of WORNUS 10, May, 2010.

[5] Hidden Markov Models as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution. A. Daidone, F. Di Giandomenico, S. Chiaradonna, and A. Bondavalli, in 25th IEEE Symposium on Reliable Distributed Systems, 2006. SRDS’06, 2006, pp. 245–256.

[6] Queueing Theory – A Linear Algebraic Approach. L. Lipsky, 2nd ed. Springer, 2009.

,,