automatically generated patches as debugging aids: a human study (fse 2014)
Post on 02-Jul-2015
281 Views
Preview:
DESCRIPTION
TRANSCRIPT
Automatically Generated Patches as Debugging Aids: A Human Study
Yida Tao, Jindae Kim, Sunghun Kim
Dept. of CSE, The Hong Kong University of Science and Technology
Chang Xu
State Key Lab for Novel Software Technology, Nanjing University
• Promising research progress• ClearView1: Prevent all 10 Firefox exploits
• GenProg2: Fix 55/105 real bugs
[1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09[2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12
2
Automatic Program Repair
3
Automatic Program Repair
- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code
4
“It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.”
Automatic Program Repair
#what-could-possibly-go-wrong
• Blackbox repair
• Increasing maintenance cost
• Vulnerable to attack
- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code- A human study of patch maintainability. ISSTA’12- Automatic patch generation learned from human-written patches. ICSE’13
5
- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code- A human study of patch maintainability. ISSTA’12- Automatic patch generation learned from human-written patches. ICSE’13
#program-out-of-control
6
#what-could-possibly-go-wrong
• Blackbox repair
• Increasing maintenance cost
• Vulnerable to attack
Use automatically generated patches as debugging aids
7
Use automatically generated patches as debugging aids
Our Human Study
• Investigate the usefulness of generated patches as debugging aids
• Discuss the impact of patch quality on debugging performance
• Explore practitioners’ feedback on adopting automatic program repair
8
Methodology
9
BugsParticipantsDebugging aid
10
Debugis given to
BugsParticipantsDebugging aid 11
BugsParticipantsDebugging aid 12
Low-quality generated patch
BugsParticipantsDebugging aid 13
Low-quality generated patch
High-quality generated patch
BugsParticipantsDebugging aid 14
Low-quality generated patch
High-quality generated patch
Buggy method location
BugsParticipantsDebugging aid 15
Grad: 44
Engr: 28
MTurk: 23
95 Participants
CS graduate students
Industrial software engineers
Amazon Mechanical Turk workers
BugsParticipantsDebugging aid 16
BugsParticipantsDebugging aid 17
44 Graduate students• Between-group design
14 students
15 students
15 students
BugsParticipantsDebugging aid 18
44 Graduate students• Between-group design
Low-quality generated patch
High-quality generated patch
Buggy method location
14 students
15 students
15 students
BugsParticipantsDebugging aid 19
44 Graduate students• Between-group design• Onsite setting
• Eclipse IDE• Supervised session
Low-quality generated patch
High-quality generated patch
Buggy method location
14 students
15 students
15 students
BugsParticipantsDebugging aid 20
Low-quality generated patch
High-quality generated patch
Buggy method location
Remote participants(28 Engr + 23 MTurk)
• Within-group design
BugsParticipantsDebugging aid 21
Remote participants(28 Engr + 23 MTurk)
• Within-group design• Online debugging system
Low-quality generated patch
High-quality generated patch
Buggy method location
BugsParticipantsDebugging aid 22
BugsParticipantsDebugging aid 23
Bug Selection Criteria
• Real bugs
• The bug has accepted patches written by developers
• Proper number of bugs
• The bug has generated patches with different quality
BugsParticipantsDebugging aid 24
Automatic patch generation learned from human-written patches. Kim et al. ICSE’13
BugsParticipantsDebugging aid 25
Automatic patch generation learned from human-written patches. Kim et al. ICSE’13
Auto-generated patch A Auto-generated patch B
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){
args[i+1] = sub.toString();}
}
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);
}
BugsParticipantsDebugging aid 26
Automatic patch generation learned from human-written patches. Kim et al. ICSE’13
Auto-generated patch A Auto-generated patch B
avg. ranking from 85 devs and students
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){
args[i+1] = sub.toString();}
}
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);
}
1.6
2.8
BugsParticipantsDebugging aid 27
Automatic patch generation learned from human-written patches. Kim et al. ICSE’13
Auto-generated patch A Auto-generated patch B
avg. ranking from 85 devs and students
High-Quality Patch Low-Quality patch
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){
args[i+1] = sub.toString();}
}
for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);
}
1.6
2.8
BugsParticipantsDebugging aid 28
BugsParticipantsDebugging aid 29
Participants submit 337 patches as their debugging outcome
BugsParticipantsDebugging aid 30
Participants submit 337 patches as their debugging outcome
Location109
LowQ112
HighQ116# submitted patches
w.r.t debugging aid
BugsParticipantsDebugging aid 31
Participants submit 337 patches as their debugging outcome
Location109
LowQ112
HighQ116# submitted patches
w.r.t debugging aid
Bug166
Bug274
Bug359
Bug476
Bug562
# submitted patches w.r.t bugs
Evaluation of debugging performance
32
Patch CorrectnessCorrectness
33
Patch Correctness
• Passing test casesCorrectness
34
Patch Correctness
• Passing test cases
• Matching the semantics of original accepted patches
Correctness
35
Patch Correctness
• Passing test cases
• Matching the semantics of original accepted patches
• 3 evaluators
Correctness
36
Debugging Time
• Eclipse Plug-in
• Website Timer
Correctness
Debugging time
37
Correctness
Debugging time
• Independent variables• Debugging aids
• Bugs
• Participant types
• Programming experience
38
Multiple Regression AnalysisCorrectness
Debugging time
• Independent variables• Debugging aids
• Bugs
• Participant types
• Programming experience
correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4
debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4
39
Post-study Survey
• Helpfulness of debugging aids
• Difficulty of bugs
• Opinions on using generated patches as debugging aids
Correctness
Debugging time
Survey feedback
40
Results
41
High-quality patches significantly improve debugging correctness
1
48%
33%
71%
42
High-quality patches significantly improve debugging correctness
1
48%
33%
71%
43
Location LowQ HighQ
% of correct patches
48%
71%
Location LowQ HighQ
% of correct patches
High-quality patches significantly improve debugging correctness
1
Positive Coefficient = 1.25
p-value= 0.00 < 0.05 48%
71%
44
Location LowQ HighQ
% of correct patches
Low-quality patches slightly undermine debugging correctness
2
48%
33%
71%
45
Location LowQ HighQ
% of correct patches
Low-quality patches slightly undermine debugging correctness
2
Negative Coefficient = -0.55
p-value= 0.09 48%
33%
71%
46
Location LowQ HighQ
% of correct patches
Low-quality patches can undermine debugging correctness
2
Negative Coefficient = -0.55
p-value= 0.09 48%
33%
71%
47
High-quality patches are more useful for difficult bugs3
48
High-quality patches are more useful for difficult bugs3
49
2
3
4
5
Bug Difficulty
Bug1Math-280
Bug2Rhino-114493
Bug3Rhino-192226
Bug4Rhino-217379
Bug5Rhino-76683
High-quality patches are more useful for difficult bugs3
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Bug1 Bug2 Bug3 Bug4 Bug5
% of correct patches
Location LowQ HighQ
50
2
3
4
5
Bug Difficulty
Bug1Math-280
Bug2Rhino-114493
Bug3Rhino-192226
Bug4Rhino-217379
Bug5Rhino-76683
4The type of debugging aid does not affect debugging time
51
4The type of debugging aid does not affect debugging time
0
20
40
60
80
Location LowQ HighQ
Debugging time (min)
52
5Other factors’ impact on debugging performance
Difficult bugs significantly slow down debugging
Engr and MTurk are more likely to debug correctly
Novices tend to benefit more from HighQ patches
53
Helpfulness of debugging aidsVery helpful
Helpful
Medium
Slightly Helpful
Not Helpful
54
Participants consider high-quality generated patches much more helpful than low-quality patches
Low-quality generated patch
High-quality generated patch
Mann-Whitney U test
p-value = 0.001
6
Feedback
55
56
Quick starting point
• Point to the buggy area
• Brainstorm
“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”
57
Quick starting point
• Point to the buggy area
• Brainstorm
Confusing, incomplete, misleading
• Wrong lead, especially for novices
• Require further human perfection
“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”
58
“Generated patches would be good at recognizing obvious problems”
“…but may not recognize more involved defects.”
59
“Generated patches would be good at recognizing obvious problems”
“…but may not recognize more involved defects.”
60
“Generated patches simplify the problem”
“…but they may over-simplify it by not addressing the root cause.”
“I would use generated patches as debugging aids, as they provide extra diagnostic information”
61
“I would use generated patches as debugging aids, as they provide extra diagnostic information”
“…along with access to standard debugging tools.”
62
Threats to Validity
63
Threats to Validity
• Bugs and generated patches may not be representative
• Quality measure of generated patches may not generalize
• May not generalize to domain experts
• Possibility of blindly reusing generated patches• Remove patches that are submitted less than 1 minute
64
Takeaway
65
• Auto-generated patches can be useful as debugging aids• Participants fix bugs more correctly with auto-
generated patches
• Quality control is required• Participants’ debugging correctness is
compromised with low-quality generated patches
• Maximize the benefits• Difficult bugs
• Novice developers
top related