modeling input-dependent error propagation in...

22
Modeling Input-Dependent Error Propagation in Programs Guanpeng (Justin) Li and Karthik Pattabiraman University of British Columbia

Upload: others

Post on 21-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

ModelingInput-DependentErrorPropagationinPrograms

Guanpeng(Justin)LiandKarthikPattabiramanUniversityofBritishColumbia

Page 2: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

SoftErrors

2

= 0001 = 0101

• Errorpropagationinprograms

• SilentDataCorruption(SDC)

• Incorrectprogramoutput

• Crash

• Benign

• Traditionalsolutionsaretooexpensive

• Hardwareduplication

• Circuithardening

Researchershaveexpectedmodernsoftwareapplicationstotoleratehardwareerrors

Page 3: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

BoundingSDCRate

PoolofRepresentative

Inputs

+

Foreachinput:

EvaluationofProgramSDCRate

FaultInjections

BoundofProgramSDCRate

… …

3

Page 4: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

FaultInjectionforMeasurementofSDCRate

ProgramExecution

SDC

Crash

Benign

Artificially introduceafault

OneProgramInput

ObserveFailure

Repeatforthousandsofsamplesforthesameinput

4

Page 5: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Problems

• FaultInjection

• Evenoneexecutionmaytakehoursinalargeprogram

• Needtorepeatforthousandsofsamplesforoneinput

• SDCishighlyInput-Dependent(Showlater)

• SDCratechangesifprograminputischanged

• Repeatthewholefaultinjectionspereachinput

• Evenworse…

• Needtore-dothewholeevaluationeverytimecodeischanged

Alreadytime-consumingforonlyoneprograminput

BoundingprogramSDCratetakestoomuchtime

Impracticaltointegrateintodevelopmentcycle

5

Page 6: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

OurGoals&Contributions

Accuracy

Performance

FaultInjection

Fastpredictionofboundingprogram

SDCrate

1. Understandhowdifferentprogram

inputsaffecterrorpropagation

2. DevelopafastmodeltoboundtheSDC

rateoftheprogramgivenmultiple

programinputsCzek etal./

Folkesson etal.

Trident vTrident

6

Page 7: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Challenges

• Faultinjectionapproachisblack-box

• Don’tknowwhathappenduringtheexecutionofbillionsofinstructions

7

Page 8: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Approach

• Understandwhattomodel

• Ouranalyticalmodel:Trident

• Closedformulaoferrorpropagation

• KeyInsight:

• OnlysomepartsoftheentiremodelingarecriticaltoboundprogramSDCrate

• RemovethepartsthatarenotsensitivetothechangeofinputfromTrident

vTridentVolatilityPredictionforTrident

Trident Insights

Trident:Three-levelmodeling

• Register-communicationmodule

• Control-flowmodule

• Memorydependencymodule

SDCpropagation=fs • fc •fm=x1 •x2 •x3 … …

Givenaprogram&

Differentinputs

8

Page 9: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Whattomodel?

• ProgramSDCrateisanaggregationofboth…

• InstructionExecution

• InstructionSDCrateInputA InputB

01010001101… …

ADDR0,R1,4

400times 100timesVariationofInstructionExecution

VariationofInstructionSDCRate 40% 70%

ProgramSDCratechanges!

Easytoprofile

HardtomodelVariationofProgram

SDCRate

Program SDC Rate = f ( Inst Exec, Inst SDC Rate )

9

Page 10: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

PriorWork

• Onlymodelsthevariationofinstructionexecution…

• PooraccuracytoquantifythevariationofprogramSDCrateVaria

tionofProgram

SDCRa

te

NeedtomodelthevariationsofbothinstructionSDCrateandinstructionexecution!

ErrorBar:0.03%- 0.55%at95%Confidence

Inst Exec10

Program SDC Rate = f ( Inst Exec, Inst SDC Rate )

Page 11: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

VariationofInstructionSDCRate

InputA InputBOneofexamples…

<Propagation Prob.>

<Propagation Prob.>CMPGTR1,R0

R0=512R1=16

R0=64R1=16

32bits

32-bitDataWidth

923

<PropagationProb.>

Reg.Value

~71%

5

~84%

27

11

Program SDC Rate = f ( Inst Exec, Inst SDC Rate )

Page 12: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

vTrident:Steps

• BoundingprogramSDCratesforgiveninputs

PoolofProgram Inputs

vTrident

Runeachinput

+

ProgramunderTest

Max Min

VariationofProgramSDCRate

-

=

SDCRa

te

GetRankings

MeasureSDCRateoftheInput

Ref

RefDeriveBounding

13

Page 13: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

ExperimentalSetup

• ComparisonoftheBoundingofSDCRate

• Accuracyandperformance

• FaultinjectionresultsderivedbyLLFI[1]asbaseline

• FaultModel

• Singlebit-flip

• Onefaultinjectedperprogramexecution

• Benchmark

• 9open-sourcebenchmarksfromvariousdomainstakingnumericalinputs

• 10inputsgeneratedforeachbenchmark[1]LLVMFaultInjector[DSN’14]

BenchmarkApplication Domains

14

Page 14: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Evaluation:VariationofProgramSDCRateVaria

tionofProgram

SDCRa

te

Max Min

VariationofProgramSDCRate

-

=• Methodology

• DerivedbyvTrident,Inst Exec,faultinjection

• Theclosertofaultinjectionresult,thebetterpredictionvTridentismuchbetterinpredictingthevariationofprogramSDCrate

ErrorBar:0.03%- 0.55%at95%Confidence

Program SDC Rate = f ( Inst Exec, Inst SDC Rate )

Inst Exec 15

Page 15: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Evaluation:BoundingSDCRate

• Methodology:

• RankingofSDCratesbyFaultInjectionandvTrident:Averagedistanceof2.11

• BoundingofasmuchasmeasuredSDCrateswiththepredictedvariationofprogramSDCrate

MeasuredSDCRatebyFaultInjection

Bounding DerivedbyInst-Exec

Bounding DerivedbyvTrident Y-axis:SDCRate;ErrorBar:0.03%- 0.55%at95%Confidence

Program SDC Rate = f ( Inst Exec, Inst SDC Rate )

16

vTridentbounds79%ofSDCswhereastheothermodelboundsmerely32%ofSDCs

Page 16: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Evaluation:Performance

• Wall-ClockTime

• Sample3,000faultswitheachinput,totally10inputsforeachbenchmark

• vTridenttakes2.6hours,8xfasterthanTrident,37xfasterthanfaultinjection

• MemoryRequired(Peak)

• vTrident:14.97GB

• Trident:4outof9benchmarksrequiresmorethan32GBmemory

vTridentissignificantlyfasterthanpriortechniques,requiringmuchlesshardwareresources

17

Page 17: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

vTridentinPractice

• Builtascompilermodule

• Fullyautomated

• FastboundingofprogramSDCrate

• Intergrationintosoftware

developmentprocess

NowcanbereplacedbyvTrident

vTrident 18

Page 18: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Summary

• Errorpropagationishighlyinput-dependent

• FaultinjectionsaretooslowtoboundprogramSDCrategivenmultipleinputs

• Understandinginput-dependenterrorpropagation

• vTridentisafastandaccuratemodeltoboundprogramSDCrate

• OpenSource:CodeavailableinthesamerepoofTrident

• https://github.com/DependableSystemsLab/Trident

Guanpeng (Justin)LiUniversityofBritishColumbia

[email protected]

Page 19: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

BackupSlides

Page 20: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

SoftwareApproach

20

Device/CircuitLevel

ArchitecturalLevel

OperatingSystemLevel

ApplicationLevel

ImpactfulErrors

Protectio

nOverhead

SoftError

Increasin

g

Page 21: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

vTrident:Methodology

• ModifyingTrident

• Simplifymemorydependencymodelingthatisnotsensitivetothevariation

• Giveninputs,vTrident…

• PredictstherelativerankingofSDCrates

• DeterminesthevariationofprogramSDCrates

12

Page 22: Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Evaluation:Performance

• Wall-ClockTime

• Sample3,000faultswitheachinput,totally10inputsforeachbenchmark

• vTridenttakes2.6hours,8xfasterthanTrident,37xfasterthanfaultinjection

• MemoryRequired

• AverageTraceSize

• vTrident:0.13MB

• Trident:28.13GB

• PeakMemoryConsumption

• vTrident:14.97GB

• Trident:4outof9benchmarksrequiresmorethan32GBmemory

vTridentissignificantlyfasterthanpriortechniques,requiringmuchlesshardwareresources

17