automated debugging in data intensive scalable computing...

25
Automated Debugging In Data Intensive Scalable Computing Systems Muhammad Ali Gulzar 1 , Matteo Interlandi 3 , Xueyuan Han 2 , Mingda Li 1 , Tyson Condie 1 , and Miryung Kim 1 1 University of California, Los Angeles 2 Harvard University 3 Mircrosoft 1

Upload: others

Post on 11-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

AutomatedDebuggingInDataIntensiveScalableComputingSystems

MuhammadAliGulzar1,MatteoInterlandi3,Xueyuan Han2,Mingda Li1,TysonCondie1, and Miryung Kim1

1UniversityofCalifornia,LosAngeles2HarvardUniversity

3Mircrosoft

1

Page 2: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

2

Developlocally Hopeitworks Runincloud Bug!

Guesswork

BigDataDebuggingintheDark

Map Reduce

1 2 3 4

5

Page 3: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

3

MotivatingExample

• AlicewritesaSparkprogramthatidentifies,foreachstateintheUS,thedeltabetweentheminimumandthemaximumsnowfallreadingforeachdayofanyyearandforanyparticularyear.

ZipCode Date SnowFall99504 01/01/1994 245mm99504 01/01/1993 85mm90031 02/01/1991 0mm… … …

Page 4: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ProblemDefinition

4

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm

AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]

TextFile FlatMap GroupByKey Map Output

AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245

…. ….

AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0

Givenatestfunction,thegoalistoidentifyaminimumsubsetoftheinputthatisabletoreproducethesametestfailure.

def test(key:String, delta: Float) : Boolean = {delta < 6000

}

• Usingatestfunction,ausercanspecifyincorrectresults

Page 5: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

5

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm

AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]

TextFile FlatMap GroupByKey Map Output

AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245

…. ….

AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0

ExistingApproach1:DataProvenanceforSpark

Itover-approximatesthescopeoffailure-inducinginputsi.e.recordsinthefaultykey-groupareallmarkedasfaulty

Page 6: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinarysearch-likeprocedureontheinputdatasetusingatestoraclefunction

6

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994,245mm99504,01/01/1993,85mm90031,02/01/1991,0mm

AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336AK,03/01 ,145AK,1993 ,145AK,01/01 ,245AK,1994 ,245

…. ….

AK ,01/01,[304.8,21336,245,85]AK ,03/01 ,[30.5,145]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336,145, 85]AK ,1994 ,[245]CA,02/01 ,[0]CA,1991 ,[0]

AK , 01/01 ,21251AK , 03/01 ,114.5AK , 1992 ,274.3AK , 1993 ,21251AK , 1994 ,0CA, 02/01 ,0CA, 1991 ,0

TextFile FlatMap GroupByKey Map Output

1

2

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Page 7: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

7

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336

AK ,01/01,[304.8,21336]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]AK ,1993 ,[21336]

AK, 01/01 ,21031AK , 03/01 , 0AK , 1992 ,274.3AK , 1993 , 0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

1

2

Run2

Page 8: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

8

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01,304.8AK,1992 , 304.8AK,03/01 ,30.5AK,1992 ,30.5

AK ,01/01,[304.8]AK ,03/01 ,[30.5]AK ,1992 ,[304.8,30.5]

AK , 01/01 ,0AK , 03/01 , 0AK , 1992 ,274.3

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run3

Page 9: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

9

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01 ,21336AK,1993 , 21336

AK ,01/01,[21336]AK ,1993 ,[21336]

AK , 01/01 ,0AK , 1993 ,0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run4

Page 10: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

10

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01,304.8AK,1992 , 304.8

AK ,01/01,[304.8]AK ,1992 ,[304.8]

AK , 01/01 ,0AK , 1992 ,0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run5

Page 11: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

11

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,03/01 ,30.5AK,1992 ,30.5

AK ,03/01 ,[30.5]AK ,1992 ,[30.5]

AK , 03/01 , 0AK , 1992 ,0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run6

Page 12: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

12

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01 ,21336AK,1993 , 21336

AK ,01/01,[21336]AK ,1993 ,[21336]

AK , 01/01 ,0AK , 1993 ,0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run7

Page 13: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

13

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,03/01 ,30.5AK,1992 ,30.5AK,01/01 ,21336AK,1993 , 21336

AK ,01/01,[21336]AK ,03/01 ,[30.5]AK ,1992 ,[30.5]AK ,1993 ,[21336]

AK, 01/01 ,0AK , 03/01 , 0AK , 1992 ,0AK , 1993 , 0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run8

Page 14: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

ExistingApproach2:DeltaDebugging

• DeltaDebuggingperformsasystematicbinary-likesearchontheinputdatasetusingatestoraclefunction

14

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in

AK,01/01,304.8AK,1992 , 304.8AK,01/01 ,21336AK,1993 , 21336

AK ,01/01,[304.8,21336]AK ,1992 ,[304.8]AK ,1993 ,[21336]

AK, 01/01 ,21031AK , 1992 ,0AK , 1993 , 0

TextFile FlatMap GroupByKey Map Output

Itdoesnotpruneinputrecordsknowntobeirrelevantbecauseofthelackofsemanticunderstandingofdata-flowoperators

Run9

Page 15: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

AutomatedDebugginginDISCwithBigSift

15

Test PredicatePushdown

PrioritizingBackwardTraces

BitmapbasedTest

Memoization

Input:ASparkProgram,ATestFunction Output:MinimumFault-InducingInputRecords

DataProvenance+DeltaDebugging

Page 16: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

16

Optimization1: TestPredicatePushdown

Ifapplicable,BigSift pushesdownthetestfunctiontotesttheoutputofcombinersinordertoisolatethefaultypartitions.

• Observation: Duringbackwardtracing,dataprovenancetracesthroughallpartitionseventhoughonlyafewpartitionscontainfaultyintermediatedata.

Test

Test

Test

Test

Test

Test

Test

WithoutTestPushdown WithTestPushdown

Page 17: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

17

99504,01/01/1992,1ft99504,03/01/1992,0.1ft99504,01/01/1993, 70in99504,03/01/1993,145mm99504,01/01/1994 ,245mm99504,01/01/1993 ,85mm90031,02/01/1991 ,0mm

AK, 01/01 ,[304.8,21336,245,85]AK, 03/01 ,[30.5,145]AK, 1992 ,[304.8,30.5]AK, 1993 ,[21336,145, 85]AK, 1994 ,[245]CA, 02/01 ,[0]CA, 1991 ,[0]

TextFile FlatMap GroupByKey Map Output

AK ,01/01,304.8AK ,1992 , 304.8AK ,03/01 ,30.5AK ,1992 ,30.5AK ,01/01 ,21336AK ,1993 , 21336AK ,03/01 ,145AK ,1993 ,145AK ,01/01 ,245AK ,1994 ,245

…. ….

AK ,01/01,21251AK ,03/01,114.5AK ,1992 ,274.3AK ,1993 ,21251AK ,1994 ,0CA ,02/01,0CA ,1991 ,0

Optimization2:PrioritizingBackwardTraces

Incaseofmultiplefaultyoutputs,BigSift overlapstwobackwardtracestominimizethescopeoffault-inducinginputrecords

• Observation:ThesamefaultyinputrecordmaycontributetomultiplefaultyoutputduetooperatorssuchasJoinorFlatmap

Page 18: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

18

Optimization3:BitmapBasedTestMemoization

Weuseabitmapbasedtestmemoization techniquetoavoidredundanttestingofthesameinputdataset.

• Observation:Deltadebuggingmaytryrunningaprogramonthesamesubsetofinputredundantly.

0

1

0

1

0

0

0

0

1

1

InputData Bitmap

𝗫

TestOutcome

• BigSift leveragesbitmaptocompactlyencodetheoffsetsoforiginalinputtorefertoaninputsubset

Page 19: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

EvaluationQuestions

• RQ1:HowmuchimprovementinthedebuggingtimedoesBigSift provideincomparisontodeltadebugging?

• RQ2:HowlongisthedebuggingtimeofBigSift incomparisontooriginalrunningtimeofajob?

• RQ3:Howmuchimprovementintheprecisionoffault-inducinginputrecordsdoesBigSift provideincomparisontodataprovenance?

Page 20: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

RQ1:PerformanceImprovementoverDeltaDebugging

SubjectProgram RunningTime(sec) DebuggingTime(sec)

SubjectProgram Fault OriginalJob DD BigSift Improvement

Movie Histogram Code 56.2 232.8 17.3 13.5X

InvertedIndex Code 107.7 584.2 13.4 43.6X

RatingHistogram Code 40.3 263.4 16.6 15.9X

SequenceCount Code 356.0 13772.1 208.8 66.0X

Rating Frequency Code 77.5 437.9 14.9 29.5X

CollegeStudent Data 53.1 235.3 31.8 7.4X

WeatherAnalysis Data 238.5 999.1 89.9 11.1X

Transit Analysis Code 45.5 375.8 20.2 18.6X

BigSift providesuptoa66Xspeedupinisolatingtheprecisefault-inducinginputrecords,incomparisontothebaselineDD

Page 21: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

RQ2:DebuggingTimevs.OriginalJobTime

SubjectProgram RunningTime(sec) DebuggingTime(sec)

SubjectProgram Fault OriginalJob DD BigSift Improvement

Movie Histogram Code 56.2 232.8 17.3 13.5X

InvertedIndex Code 107.7 584.2 13.4 43.6X

RatingHistogram Code 40.3 263.4 16.6 15.9X

SequenceCount Code 356.0 13772.1 208.8 66.0X

Rating Frequency Code 77.5 437.9 14.9 29.5X

CollegeStudent Data 53.1 235.3 31.8 7.4X

WeatherAnalysis Data 238.5 999.1 89.9 11.1X

Transit Analysis Code 45.5 375.8 20.2 18.6X

Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.

Page 22: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

RQ2:DebuggingTimevs.OriginalJobTime

1

10

100

1000

10000

100000

1000000

10000000

1000000001E+09

0 2000 4000 6000 8000 10000 12000 14000

#offault-ind

ucinginpu

trecords

FaultLocalizationTime(s)

SequenceCount

DeltaDebugging BigSift

TestDrivenDataProvenance DataProvenance

Onaverage,BigSift takes62%lesstimetodebugasinglefaultyoutput thanthetimetakenforasinglerunontheentiredata.

Page 23: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

RQ3:FaultLocalizabilityoverDataProvenance

143796

6487290

520904

234115800

15003060

2554788

350

2

1350

15 13

1 1 1 1 1 12

1

10

100

1000

10000

100000

1000000

10000000

100000000

MovieHistorgram

InvertedIndex

RatingHistogram

SequenceCount

RatingFrequency

CollegeStudents

WeatherAnalysis

#offault-ind

ucinginpu

trecords

DataProvenance TestDrivenDataProvenance BigSift&DD

BigSift leveragesDDafterDPtocontinuefaultisolation,achievingseveralordersofmagnitude103 to107 betterprecision.

Page 24: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

Conclusion

• BigSift isthefirstpieceofworkinautomateddebuggingofbigdataanalyticsinDISC.

• BigSift provides103X– 107Xmoreprecisionthandataprovenanceintermsoffaultlocalizability.

• Itprovidesupto66XspeedupindebuggingtimeoverbaselineDeltaDebugging.

• Inourevaluationwehaveobservedthat,onaverage,BigSiftfindsthefaultyinputin62%lessthantheoriginaljobexecutiontime.

Page 25: Automated Debugging In Data Intensive Scalable Computing ...web.cs.ucla.edu/~gulzar/assets/pdf/socc2017-bigsift...Muhammad Ali Gulzar1,Matteo Interlandi3, XueyuanHan2, MingdaLi1, Tyson

Questions?