automated debugging in data intensive scalable computing...
TRANSCRIPT
AutomatedDebuggingInDataIntensiveScalableComputingSystems
MuhammadAliGulzar1,MatteoInterlandi3,Xueyuan Han2,Mingda Li1,TysonCondie1, and Miryung Kim1
1UniversityofCalifornia,LosAngeles2HarvardUniversity
3Mircrosoft
1
2
Developlocally Hopeitworks Runincloud Bug!
Guesswork
BigDataDebuggingintheDark
Map Reduce
1 2 3 4
5
3
MotivatingExample
• AlicewritesaSparkprogramthatidentifies,foreachstateintheUS,thedeltabetweentheminimumandthemaximumsnowfallreadingforeachdayofanyyearandforanyparticularyear.
ZipCode Date SnowFall99504 01/01/1994 245mm99504 01/01/1993 85mm90031 02/01/1991 0mm… … …
ProblemDefinition
4
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
Givenatestfunction,thegoalistoidentifyaminimumsubsetoftheinputthatisabletoreproducethesametestfailure.
def test(key:String, delta: Float) : Boolean = {delta < 6000
}
• Usingatestfunction,ausercanspecifyincorrectresults
5
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
ExistingApproach1:DataProvenanceforSpark
Itover-approximatesthescopeoffailure-inducinginputsi.e.recordsinthefaultykey-groupareallmarkedasfaulty
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinarysearch-likeprocedureontheinputdatasetusingatestoraclefunction
6
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994,245mm99504,01/01/1993,85mm90031,02/01/1991,0mm
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336AK,03/01 ,145AK,1993 ,145AK,01/01 ,245AK,1994 ,245
…. ….
AK ,01/01,[304.8,21336,245,85]AK ,03/01 ,[30.5,145]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336,145, 85]AK ,1994 ,[245]CA,02/01 ,[0]CA,1991 ,[0]
AK , 01/01 ,21251AK , 03/01 ,114.5AK , 1992 ,274.3AK , 1993 ,21251AK , 1994 ,0CA, 02/01 ,0CA, 1991 ,0
TextFile FlatMap GroupByKey Map Output
1
2
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
7
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[304.8,21336]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336]
AK, 01/01 ,21031AK , 03/01 , 0AK , 1992 ,274.3AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
1
2
Run2
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
8
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5
AK ,01/01,[304.8]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]
AK , 01/01 ,0AK , 03/01 , 0AK , 1992 ,274.3
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run3
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
9
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,1993 ,[21336]
AK , 01/01 ,0AK , 1993 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run4
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
10
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8
AK ,01/01,[304.8]AK ,1992 ,[304.8]
AK , 01/01 ,0AK , 1992 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run5
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
11
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,03/01 ,30.5AK,1992 ,30.5
AK ,03/01 ,[30.5]AK ,1992 ,[30.5]
AK , 03/01 , 0AK , 1992 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run6
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
12
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,1993 ,[21336]
AK , 01/01 ,0AK , 1993 ,0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run7
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
13
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[21336]AK ,03/01 ,[30.5]AK ,1992 ,[30.5]AK ,1993 ,[21336]
AK, 01/01 ,0AK , 03/01 , 0AK , 1992 ,0AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run8
ExistingApproach2:DeltaDebugging
• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction
14
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in
AK,01/01,304.8AK,1992 , 304.8AK,01/01 ,21336AK,1993 , 21336
AK ,01/01,[304.8,21336]AK ,1992 ,[304.8]AK ,1993 ,[21336]
AK, 01/01 ,21031AK , 1992 ,0AK , 1993 , 0
TextFile FlatMap GroupByKey Map Output
Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators
Run9
AutomatedDebugginginDISCwithBigSift
15
Test PredicatePushdown
PrioritizingBackwardTraces
BitmapbasedTest
Memoization
Input:ASparkProgram,ATestFunction Output:MinimumFault-InducingInputRecords
DataProvenance+DeltaDebugging
16
Optimization1: TestPredicatePushdown
Ifapplicable,BigSift pushesdownthetestfunctiontotesttheoutputofcombinersinordertoisolatethefaultypartitions.
• Observation: Duringbackwardtracing,dataprovenancetracesthroughallpartitionseventhoughonlyafewpartitionscontainfaultyintermediatedata.
Test
Test
Test
Test
Test
Test
Test
WithoutTestPushdown WithTestPushdown
17
99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm
AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]
TextFile FlatMap GroupByKey Map Output
AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245
…. ….
AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0
Optimization2:PrioritizingBackwardTraces
Incaseofmultiplefaultyoutputs,BigSift overlapstwobackwardtracestominimizethescopeoffault-inducinginputrecords
• Observation:ThesamefaultyinputrecordmaycontributetomultiplefaultyoutputduetooperatorssuchasJoinorFlatmap
18
Optimization3:BitmapBasedTestMemoization
Weuseabitmapbasedtestmemoization techniquetoavoidredundanttestingofthesameinputdataset.
• Observation:Deltadebuggingmaytryrunningaprogramonthesamesubsetofinputredundantly.
0
1
0
1
0
0
0
0
1
1
InputData Bitmap
✔
𝗫
TestOutcome
• BigSift leveragesbitmaptocompactlyencodetheoffsetsoforiginalinputtorefertoaninputsubset
EvaluationQuestions
• RQ1:HowmuchimprovementinthedebuggingtimedoesBigSift provideincomparisontodeltadebugging?
• RQ2:HowlongisthedebuggingtimeofBigSift incomparisontooriginalrunningtimeofajob?
• RQ3:Howmuchimprovementintheprecisionoffault-inducinginputrecordsdoesBigSift provideincomparisontodataprovenance?
RQ1:PerformanceImprovementoverDeltaDebugging
SubjectProgram RunningTime(sec) DebuggingTime(sec)
SubjectProgram Fault OriginalJob DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
InvertedIndex Code 107.7 584.2 13.4 43.6X
RatingHistogram Code 40.3 263.4 16.6 15.9X
SequenceCount Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
CollegeStudent Data 53.1 235.3 31.8 7.4X
WeatherAnalysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
BigSift providesuptoa66Xspeedupinisolatingtheprecisefault-inducinginputrecords,incomparisontothebaselineDD
RQ2:DebuggingTimevs.OriginalJobTime
SubjectProgram RunningTime(sec) DebuggingTime(sec)
SubjectProgram Fault OriginalJob DD BigSift Improvement
Movie Histogram Code 56.2 232.8 17.3 13.5X
InvertedIndex Code 107.7 584.2 13.4 43.6X
RatingHistogram Code 40.3 263.4 16.6 15.9X
SequenceCount Code 356.0 13772.1 208.8 66.0X
Rating Frequency Code 77.5 437.9 14.9 29.5X
CollegeStudent Data 53.1 235.3 31.8 7.4X
WeatherAnalysis Data 238.5 999.1 89.9 11.1X
Transit Analysis Code 45.5 375.8 20.2 18.6X
Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.
RQ2:DebuggingTimevs.OriginalJobTime
1
10
100
1000
10000
100000
1000000
10000000
1000000001E+09
0 2000 4000 6000 8000 10000 12000 14000
#offault-ind
ucinginpu
trecords
FaultLocalizationTime(s)
SequenceCount
DeltaDebugging BigSift
TestDrivenDataProvenance DataProvenance
Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.
RQ3:FaultLocalizabilityoverDataProvenance
143796
6487290
520904
234115800
15003060
2554788
350
2
1350
15 13
1 1 1 1 1 12
1
10
100
1000
10000
100000
1000000
10000000
100000000
MovieHistorgram
InvertedIndex
RatingHistogram
SequenceCount
RatingFrequency
CollegeStudents
WeatherAnalysis
#offault-ind
ucinginpu
trecords
DataProvenance TestDrivenDataProvenance BigSift&DD
BigSift leveragesDDafterDPtocontinuefaultisolation,achievingseveralordersofmagnitude103 to107 betterprecision.
Conclusion
• BigSift isthefirstpieceofworkinautomateddebuggingofbigdataanalyticsinDISC.
• BigSift provides103X– 107Xmoreprecisionthandataprovenanceintermsoffaultlocalizability.
• Itprovidesupto66XspeedupindebuggingtimeoverbaselineDeltaDebugging.
• Inourevaluationwehaveobservedthat,onaverage,BigSiftfindsthefaultyinputin62%lessthantheoriginaljobexecutiontime.
Questions?