130614 sebastiano panichella - mining source code descriptions from developers communications
DESCRIPTION
Software mining, source code, developers, e-mailsTRANSCRIPT
![Page 1: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/1.jpg)
Mining Source Code Descriptions from Developer Communications
Sebastiano Jairo Massimiliano Andrian Gerardo Panichella Aponte Di Penta Marcus Canfora
![Page 2: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/2.jpg)
Context: Software Project
Documentation
Source Code
Developer
Class diagram
Sequence diagram Program
Comprehension
Maintenance Tasks
![Page 3: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/3.jpg)
Context: Software Project
Documentation
Source Code
Developer
understanding
Class diagram
Sequence diagram Program
Comprehension
Difficult
Maintenance Tasks
![Page 4: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/4.jpg)
Context: Software Project
Documentation
Source Code
Developer
understanding
describes
Class diagram
Sequence diagram Program
Comprehension
understanding Difficult
Maintenance Tasks
![Page 5: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/5.jpg)
Source Code
Developer
Coming back to the reality...
Context: Software Project
Program Comprehension
Maintenance Tasks
understanding Difficult
![Page 6: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/6.jpg)
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
Idea
In such situations developers need to infer knowledge from,
the source code itself source code descriptions in external artifacts.
Developer
![Page 7: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/7.jpg)
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
Idea
In such situations developers need to infer knowledge from,
the source code itself source code descriptions in external artifacts.
Developer
..................................................
When call the method IndexSplitter.split(File
destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/luc
ene/index) it creates an index with segments
descriptor file with wrong data. Namely wrong
is the number representing the name of segment
that would be created next in this index.
..................................................
CLASS: IndexSplitter METHOD: split
![Page 8: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/8.jpg)
A Five Step-Approach for Mining Method Descriptions
Developer
![Page 9: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/9.jpg)
Step 1: Downloading emails/bugs reports and tracing them onto classes
Two heuristics
The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit
Developer Discussion
When call the method .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
IndexSplitter
![Page 10: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/10.jpg)
Step 1: Downloading emails/bugs reports and tracing them onto classes
Two heuristics
The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit
Developer Discussion
When call the method .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
CLASS: IndexSplitter
IndexSplitter
![Page 11: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/11.jpg)
Step 2: Extracting paragraphs
Two heuristics
We consider as paragraphs, text section separated by one or more white lines
We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al.
Developer Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
PAR2
PAR3
PAR1
![Page 12: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/12.jpg)
Step 2: Extracting paragraphs
Two heuristics
We consider as paragraphs, text section separated by one or more white lines
We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al.
Developer Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }
If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
PAR2
PAR3
PAR1
![Page 13: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/13.jpg)
When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory it creates an index with segments
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.
......................................................................................
......................................................................................
......................................................................................
......................................................................................
Step 3: Tracing paragraphs onto methods
These paragraphs must
respect the following
two conditions:
A) A valid paragraph must contain the keyword “method”
B) and the method name must be followed by a open parenthesis— i.e., we match “foo(”
Developer Discussion
PAR1
CLASS: IndexSplitter
METHOD: split(
A) B)
![Page 14: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/14.jpg)
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
..........................
Problem seems to come from
MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods
,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
(IProgressMonitor,
IJavaSearchScope, boolean)
METHOD: serachMainMethods
SCORE
![Page 15: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/15.jpg)
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1
1 if the method does not
have parameters
..........................
Problem seems to come from
MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods
,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
(IProgressMonitor,
IJavaSearchScope, boolean)
METHOD: serachMainMethods % parameter = 100% -> s1= 1
SCORE
![Page 16: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/16.jpg)
a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not
have parameters
Equal to one if the method is
void.
..........................
Problem seems to come from
MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods
,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After
IType[] before the excecution
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
SCORE
(IProgressMonitor,
IJavaSearchScope, boolean)
return
1+
% parameter = 100% -> s1= 1
=
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
![Page 17: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/17.jpg)
a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not
have parameters
Equal to one if the method is
void.
c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise
d) Method invocations: 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
..........................
Problem seems to come from
MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods
,there's
a to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types sub-
types as soon as the given scope
encloses them without testing if
sub-types have a main! After
IType[] before the
..........................
CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
SCORE =
return
1+
(IProgressMonitor,
IJavaSearchScope, boolean)
excecution
call
0+ 1
% parameter = 100% -> s1= 1
= 2
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
![Page 18: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/18.jpg)
We selected paragraphs that have: 1. s1 ≥ thP = 0.5
2. s2 + s3 + s4 ≥ thH = 1
SCORE = 1+ 0+ 1
% parameter = 100% -> s1= 1 ≥ 0.5
= 2 ≥ 1
a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not
have parameters
Equal to one if the method is
void.
c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise
d) Method invocations: 1 if any of the “call” or s4=“execute” keywords appears in the paragraph, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:
OK
![Page 19: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/19.jpg)
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they are likely describing.
Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model
METHOD PARAGRAPH SCORE Similarity
Method_3 Paragraph_4 2.5 96.1%
Method_1 Paragraph_1 2.5 95.6%
Method_2 Paragraph_2 1.5 97.4%
Method_3 Paragraph_3 1.5 86.2%
Method_1 Paragraph_3 1.5 79.0%
Method_3 Paragraph_2 1.5 77.5%
Method_2 Paragraph_4 1.5 64.3%
Method_2 Paragraph_3 1.3 83.2%
Method_3 Paragraph_1 1.3 73.9%
Method_2 Paragraph_1 1.3 68.7%
Method_1 Paragraph_4 1.3 53.6%
![Page 20: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/20.jpg)
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they are likely describing.
Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model
METHOD PARAGRAPH SCORE Similarity
Method_3 Paragraph_4 2.5 96.1%
Method_1 Paragraph_1 2.5 95.6%
Method_2 Paragraph_2 1.5 97.4%
Method_3 Paragraph_3 1.5 86.2%
Method_1 Paragraph_3 1.5 79.0%
Method_3 Paragraph_2 1.5 77.5%
Method_2 Paragraph_4 1.5 64.3%
Method_2 Paragraph_3 1.3 83.2%
Method_3 Paragraph_1 1.3 73.9%
Method_2 Paragraph_1 1.3 68.7%
Method_1 Paragraph_4 1.3 53.6%
th>=0.80
![Page 21: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/21.jpg)
Empirical Study • Goal: analyze source code descriptions in developer
discussions
• Purpose: investigating how developer discussions describe methods of Java Source Code
• Quality focus: find good method description in messages exchanged among contributors/developers
• Context: Bug-report and mailing lists of two Java Project Apache Lucene and Eclipse
![Page 22: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/22.jpg)
Context
![Page 23: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/23.jpg)
Research Questions RQ1 (method coverage): How many methods from
the analyzed software systems are described by the paragraphs identified by the proposed approach?
RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?
RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
![Page 24: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/24.jpg)
RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
![Page 25: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/25.jpg)
RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
![Page 26: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/26.jpg)
RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
![Page 27: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/27.jpg)
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
![Page 28: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/28.jpg)
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
![Page 29: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/29.jpg)
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
![Page 30: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/30.jpg)
RQ3: How many potentially good method descriptions are missed by the approach?
TABLE III The analysis of a sample of 100 paragraphs traced to methods,
but not satisfying the Step 4 heuristic
System True Negatives False Negatives
Eclipse 78 22
Apache Lucene 67 33
We sampled 100 descriptions from each project
![Page 31: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/31.jpg)
Conclusion
![Page 32: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/32.jpg)
Conclusion
![Page 33: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/33.jpg)
Conclusion
![Page 34: 130614 sebastiano panichella - mining source code descriptions from developers communications](https://reader034.vdocument.in/reader034/viewer/2022042816/55985ae91a28ab5a768b4635/html5/thumbnails/34.jpg)
Conclusion