user-perceived source code quality estimation based on static analysis metrics

1

User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

Michail Papamichail, Themistoklis Diamantopoulos and Andreas SymeonidisElectrical and Computer Engineering Dept., Aristotle University of Thessaloniki

Intelligent Systems & Software Engineering Labgroup, Information Processing LaboratoryThessaloniki, Greece

Email: {mpapamic, thdiaman}@issel.ee.auth.gr, [email protected]

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

2 Outline

The concept of user-perceived quality. Research objectives. Key implementation points. The designed system. Evaluation. Conclusion and Future work.



3 Why to evaluate code quality?

Various open source software projects. Numerous online software repositories.

Source Code Quality Evaluation


Code Reuse

Is a software component suitable for

reuse?


4 User-Perceived source code quality


Idea: Use of software components popularity as a quality

indicator.But: Popularity cannot be used as a sole quality criterion.

- Is based on current trends.- Depends on the programming language.

Popularity Static Analysis Metrics

Recommended Coding Practices+¿ +¿ Measure of

quality


5 The ideaUser-Perceived Quality Estimation


Idea Tools Used Proposed System Use of software

components popularity as ground truth – GitHub number of stars

Use of static analysis metrics and violations of “good” coding practices

Apply machine learning techniques for estimating user-perceived source code quality

Static Analysis

Quality Evaluation

Models

Quality Score


6 Key implementation points


Qualitative evaluation of the selected repositories.

Training set formation. Target set formation. Quality estimation models.


7 Training dataset


Top 100 Repositories

GitHub

Metrics Report

Violations Report

24930 files

Training Dataset

Qualitative Evaluation


8 Selected repositories qualitative evaluation



PMD Ruleset

Percentage (%) of files containing severe

violations

PMD Ruleset

Percentage (%) of files containing severe

violations

Priority 1 Priority2 Priority 1 Priority2

Unused Code 0.0% 0.0% Coupling 0.0% 0.0%Basic 0.015% 0.337% Design 3.37% 3.9%

Braces 0.0% 0.0% Empty 0.0% 0.0%Comments 0.0% 0.0% Finalizers 0.0% 0.0%

Naming 14.11% 0.45% Optimizations

0.0% 0.0%

Clone 0.0% 0.0% Strict Exception

4.99% 0.0%

CodeSize 0.0% 0.0% Strings 0.0% 0.06%Controversia

l1.75% 1.58% Unnecessary 0.0% 0.0%

Very small percentage of files contain

severe violations

9 Target set formation Use of GitHub stars as ground truth.

But: GitHub stars per repository (NOT per file) Every source code file is of different

importance Big differences in the number of files

between repositories

10000

x stars

y stars

z stars

Dependency Analysis



10 Target set formation

For the i-th file of the j-th repository, the target if formulated as follows:

Smoothing factor A base score to

all files in the same repository

Added value according to the significance of

the source code file



11 Quality Evaluation Models


ANNs Model Input: The values of 73 static analysis metrics. Output: User-Perceived source code quality estimation Applicable only for source code files that exceed

minimum quality threshold SVMs - One Class Classifier Used to rule out low quality code.

One Class Classifier

ANNs Model

AcceptedStatic

Analysis Quality Estimation


12 ANNs Model


Two-layer feedforward network. Levenberg-Marquardt algorithm

(LMA) for adjusting the weights and the biases.

(Training, Validation, Test) = (70%, 15%, 15%).

Applicable only for source code files that exceed minimum quality threshold.


13 SVMs One Class Classifier


Used to rule out low quality code. Gaussian radial basis kernel function. Training involved the use of 7 metrics:

Average Block Depth, Average Cyclomatic Complexity Average Depth of Inheritance Hierarchy Average Line of Codes Per Method Comments Ratio Distance Lines Of Code

(nu, gamma, tolerance) = (0.1, 0.01, 0.01)


1124 false-

positives

14 System Evaluation


Results validation: Quantitative: Using PMD Qualitative: Examination of a representative sample of files

and their metricsEvaluation on three main axes:1. The system's ability to distinguish high quality source code

files.2. The effectiveness of the model for estimating the quality of

files exceeding a quality threshold.3. The accuracy of predicting the popularity of Java repositories

given their source code files.


15 System Evaluation


Repositories selected: 8 random typical GitHub projects chosen independently. lines-of-code-per-file ratio around 100, including also several

extreme cases. Both human and auto-generated code. The auto-generated projects are expected to be of high quality.

Follow all coding conventions. Are architecturally and functionally complete.


16 Evaluation – One Class Classifier



The percentage of the rejected files that contained

severe violations is very high

17 Evaluation – ANNs Model



The quality score reflects the

characteristics of the repositories

18 Evaluation – Popularity Prediction



19 Conclusions and future work


Conclusions: Reliable determination of the area of high quality source code

based on static analysis metrics. Effective user-perceived source code quality estimation.Future Work: Further investigation of the response of our model in different

scenarios. Expansion of the ground truth coverage by using more metrics. Application of feature selection techniques in order to drop

overlapping metrics.


20

Thank you!