crash report 2014

The CRASH Report - 2014(CAST Research on Application Software Health)

Summary of Key Findings

2

The CRASH Report - 2011/12 • Summary of Key Findings

Contents

1. Introduction to CRASH .............................................. 4

2. The CRASH Sample .................................................... 4

3. Structural Quality Measurement ................................ 5

4. Observations on the Full CRASH Sample ................... 6

5. Source, Shore, and Number of Users ........................... 9

6. CMMI Maturity Level ................................................ 9

7. Development Method ............................................... 11

8. Summary of Factors Affecting Health Factor Scores ..13

Authors ......................................................................... 14

The CRASH Report - 2014 | Summary of Key Findings

2

3


Executive Summary

CR ASH reports highlight trends in f ive structural quality characteris-tics, or health factors - Robustness, Performance, Security, Changeabil-ity, and Transferability. The data reported here are from the Appmarq benchmarking repository maintained by CAST, comprising 1316 applica-tions submitted by 212 organizations from 12 industry sectors located pri-marily in the United States, Europe, and India. These applications totaled approximately 706 million lines of code.

Statistical analysis found that:• An application suffering from viola-

tions of good architectural and cod-ing practice that make it less robust are also likely to be less secure.

• With minor exceptions, the health factor scores have little relation to application size.

• CMMI Level 1 organizations pro-duced applications with substan-tially lower structural quality on all health factors than applications developed in CMMI Level 2 or Level 3 organizations.

• Across all health factors, a mix of Agile and Waterfall methods pro-duced higher scores than either Agile or Waterfall methods alone.

• The choice to develop applications in-house versus outsourced had no effect on health factor scores, while the choice to develop appli-cations onshore versus offshore had very small effects on Changeability and Robustness.


3

1


1. Introduction to CRASH

This is the third biennial report produced by CAST on global trends in the structural quality of business application software. Structural quality refers to the engineering soundness of the architecture and coding of an application, rather than to the correct-ness with which it implements the custom-er’s functional requirements. These reports highlight trends in five structural quality characteristics, or health factors - Robust-ness, Security, Performance, Transferabil-ity, and Changeability. Structural quality is measured as violations of rules representing good architectural and coding practice in each of these five areas.

Evaluating an application for violations of structural quality rules is critical since they are difficult to detect through standard test-ing. Structural quality flaws are the defects most likely to cause operational problems such as outages, performance degrada-tion, unauthorized access, or data corrup-tion. CRASH reports provide an objective, empirical foundation for discussing the structural quality of software applications throughout industry and government. This report provides a brief summary of the im-portant results from the full 2014 CRASH Report.

2. The CRASH Sample

The CRASH Report data are drawn from the Appmarq benchmarking repository main-tained by CAST, comprised of 1316 applica-tions submitted by 212 organizations for the analysis and measurement of their structural

quality characteristics. These applications totaled approximately 706 MLOC (million lines of code). These organizations are lo-cated primarily in the United States, Europe, and India. The sample includes 565 applica-tions written primarily in Java-EE, 280 in COBOL, 127 in .NET, 77 in ABAP, 59 in Oracle Forms, 33 in Oracle ERP, 39 in C, 28 in C++, 24 in ASP, and 84 written in a mix of languages.

The sample is widely distributed across size categories and appears representative of the types of applications in business use. How-ever, the applications usually submitted for structural analysis and measurement tend to be business critical systems, so we do not claim that this sample is statistically repre-sentative of all the world’s business applica-tions. Rather, it appears most representative of the mission critical subset of the custom application portfolio. The smallest applica-tion accepted into the CRASH sample con-tains 10 KLOC (kilo or thousand lines of code). Within the CRASH sample 28% of the applications are less than 50 KLOC, 33% contain between 50 KLOC and 200 KLOC, 29% contain between 201 KLOC and 1 MLOC, and 11% are over 1 MLOC includ-ing 20 applications over 5 MLOC. At least one application in each language contained over 1 MLOC, with Java-EE, COBOL, and C having applications over 10 MLOC.

There are 12 industry sectors represented in the 212 organizations that submitted applications to the Appmarq repository. Financial services firms submitted 421 ap-plications, 314 from insurance, 187 from telecom, 169 from manufacturing, 56 from utilities, 56 from government agencies, 48

706 million lines of code

1316 custom applications

212 organizations

12 industry sectors

11% of the applications are over a million LOC


4


from retail, 41 from IT consulting, 40 from business service providers, 32 from indepen-dent software vendors, 22 from energy, and the remainder from a mix of other business sectors. Java-EE applications accounted for at least one-third of the applications in every industry segment except insurance. Several strong associations were observed between industry sectors and languages. For instance, the preponderance of COBOL applications were in financial services and insurance. ABAP applications were observed primarily in manufacturing, while C applications were most prominent in telecom and utilities.

3. Structural Quality Measurement

The following terms will be used through this report.

Structural Quality: The non-functional quality of a software application that indi-cates how well the code is written from an engineering perspective. It is sometimes referred to as technical quality or internal quality, and represents the extent to which the application is free from violations of good architectural or coding practice.

Structural Quality Health Factors: The CRASH data include five structural quality characteristics, which will be called health factors in this report. Scores for these health factors are computed on a scale of 1 (high risk) to 4 (low risk) by analyzing the applica-tion to detect violations of over 1200 good architectural and coding practices. Scoring is based on an algorithm that evaluates the number of times a violation occurred com-pared to the number of opportunities where

it could have occurred, weighted by the se-verity of the violation and its relevance to each individual health factor. The quality characteristics analyzed in the CRASH Re-port are:

• Robustness: The stability and resiliency of an application and the likelihood of in-troducing defects when modifying it.

• Performance: The efficiency of the soft-ware with respect to processing time and resources used.

• Security: An application’s ability to pre-vent unauthorized intrusions.

• Changeability: An application’s ability to be easily and quickly modified.

• Transferability: The ease with which a new team can understand an application and become productive working on it.

• Total Quality Index: A composite score computed by aggregating scores from the five health factors listed above.

Violation: A structure in the source code that is inconsistent with good architectural or coding practice and has proven in the past to cause problems that affect either the cost or risk of an application.

Technical Debt: Technical debt represents the effort required to fix violations of good architectural and coding practices that re-main in the code when an application is released. Technical debt is calculated only on violations that the organization intends to remediate. Like financial debts, techni-cal debts incur interest in the form of extra costs accruing for a violation until it is reme-diated, such as the extra effort required to modify the code or inefficient use of hard-ware or network resources.

Java-EE applica-tions accounted for at least 1/3 of the applica-tions in every in-dustry segment except insurance


5

Structural qua-lity - the extent to which the application is free from vio-lations of good architectural or coding practice.

3


tion and descriptive statistics tend toward a mean among the statistics for each of its five component health factors with the exception that its range and standard deviation are less than that of its components. Thus, the Total Quality Index exhibits less variation and is less affected by outliers or extreme scores.

Within these data Security was strongly cor-related with Robustness, which means that violations of good architectural and coding practice that reduce an application’s Robust-ness are also very likely to be accompanied by the types of violations that make it less secure. Performance showed little correla-tion with the other health factors, contra-dicting the long-standing belief that chang-es which affect Performance positively will affect other software attributes negatively.

4. Observations on the Full CRASH Sample

The distribution of scores for each of the health factors are presented in Figure 1. The distributions for Robustness, Security, and Changeability are negatively skewed indi-cating the preponderance of scores are in the upper range. Approximately 75% of scores for the operational risk factors of Robust-ness, Performance, and Security are above 3.0, compared to the lower distributions for the cost-related health factors of Change-ability and Transferability. Among pos-sible explanations are that fewer violations related to operational risk are released from development, or these violations are priori-tized for remediation over the cost-related factors of Transferability and Changeability. Since the Total Quality Index is a composite of the five health factor scores, its distribu-

Figure 1. Distributions of Health Factor scores for full 2014 sample

Architectural and coding flaws that reduce an application’s Robustness are often accom-panied by flaws that make it less Secure.


6

4

The CRASH Report - 2011/12 • Summary of Key FindingsThe CRASH Report - 2014 | Summary of Key Findings

7

5


Finally, the five health factor scores have little or no relation to size when analyzed across the entire CRASH sample. The only exceptions to this observation were negative relationships between size and Robustness in Java-EE and between size and Security in COBOL. As shown in Figure 2, the Securi-ty scores for COBOL applications decline as size increases. Although there are COBOL applications with lower Security scores in all size ranges, the decline in Security scores is dramatic for COBOL applications over 3 million lines of code.

The following sections will report on how various demographic factors affected struc-tural quality in the CRASH sample. Since different numbers of violations were defined for each health factor in each language, de-mographic effects cannot be easily compared across applications written in different lan-guages. Only the large sample of Java-EE applications contains a sufficient number of applications in each category of the various demographic variables to make statistically valid inferences from the data.

Figure 2. Scatterplot of Security scores with size in COBOL

The five health factor scores have little or no relation to size with two lan-guage specific exceptions.


8

1


5. Source, Shore, and Number of Users

Of the 501 Java-EE applications that report-ed sourcing information, 224 were devel-oped in-house, while 277 were outsourced. There were no significant differences in the average size measured in lines of code between in-house and outsourced applica-tions. There were no statistically significant differences between sourcing choices on any health factor scores in the Java-EE sample. Although there were no mean differences on these health factors, there was substantial variation within each sourcing category sug-gesting that factors other than application source might be affecting the health factor scores.

In the Java-EE sample, 387 applications were developed onshore while 114 were developed offshore. There were no statistically signifi-cant differences in scores for Performance, Security, Transferability, or the size of appli-cations developed onshore or offshore. The only significant differences based on shoring choice indicated that applications developed onshore were slightly more changeable and robust. Although statistically significant, these differences were so small - accounting for less than 2% of the variation in the scores - that they have little practical significance.

In the Java-EE sample 50 applications were reported to serve under 500 users, 37 appli-cations served 500 to 5000 users, and 101 applications served more than 5000 users. Significant differences were found for all health factors based on the number of users served by the application. Across all health factors the significant differences were ac-

counted for by health factor scores for ap-plications serving more than 5000 users be-ing higher than scores for those serving 5000 or fewer users. Applications servingmorethan5000users are typically customer fac-ing applications. Therefore it is not surpris-ing that greater effort would be focused on the structural quality of these applications considering their risk to the business if they suffer operational problems or are difficult to maintain.

6. CMMI Maturity Level

In the Java-EE sample, 23 applications were developed in CMMI Level 1 organiza-tions, 26 were developed in CMMI Level 2 organizations, and 32 were developed in CMMI Level 3 organizations. There were not enough CMMI Level 4 or Level 5 or-ganizations in the sample to provide valid comparisons beyond CMMI Level 3. There were no significant differences in the sizes as measured in lines of code between the appli-cations developed in organizations at any of the three CMMI levels.

Figure 3 displays the distributions of scores for applications developed in CMMI Levels 1, 2, and 3 organizations for each health factor. Sig-nificant differences were observed among ap-plications developed at different CMMI matu-rity levels on all health factors. The strongest effects were observed for Robustness, Security, and Changeability, accounting for between 20% and 28% of variation in the scores. The statistically significant impact of CMMI Ma-turity Level on Performance and Transferabil-ity was not as strong, but still accounted for be-tween 11% and 12% of the variation in scores.

There were no statistically si-gnificant diffe-rences between sourcing choices on any health factor scores.


9

Health factor scores for ap-plications ser-ving more than 5000 users are higher than scores for those serving 5000 or fewer users.

2

The CRASH Report - 2011/12 • Summary of Key FindingsThe CRASH Report - 2014 | Executive Summary of Findings The CRASH Report - 2014 | Summary of Key Findings

10

Figure 3. Health factor distributions for CMMI Levels 1, 2, and 3 applicationsApplications developed by Level 1 orga-nizations have significantly lower health factor scores that those developed in CMMI Level 2 or Level 3 or-ganizations.

Level 1

Level 1

Level 1

Level 2

Level 2

Level 2

Level 2

Level 2

Level 2

Level 1

Level 1

Level 1

Level 3

Level 3

Level 3

Level 3

Level 3

Level 3

4,0

3,5

3,0

2,5

2,0

4,0

3,5

3,0

2,5

2,0

4,0

3,5

3,0

2,5

2,0

4,0

3,5

3,0

2,5

2,0

4,0

3,5

3,0

2,5

2,0

4,0

3,5

3,0

2,5

2,0

Perf

orm

ance

Tran

sfer

abili

tySe

curit

yRo

bust

ness

Chan

geab

ility

Tota

l Qua

lity

Inde

x

3


Further statistical analysis confirmed that the significant mean differences observed on each health factor resulted from applications developed by Level 1 organizations having significantly lower health factor scores than those developed in CMMI Level 2 or Level 3 organizations. No statistically significant differences were observed between the scores on any of the health factors for CMMI Level 2 and Level 3 organizations.

These results are not surprising since the change from CMMI Level 1 to Level 2 in-volves controlling the most common imped-iments to successful software engineering practice such as unachievable commitments and volatile requirements. With these prob-lems managed, developers are able to per-form their work in a more orderly and pro-fessional manner, resulting in fewer mistakes during development. This change will have significant impact on the structural quality of the software. The growth from CMMI Level 2 to Level 3 is focused more on achiev-ing an economy of scale from standardizing development practices, so it is not surpris-ing that health factor scores were similar be-tween these two levels. Nevertheless, these data offer definitive proof that process im-provements can have strong positive effects on the structural quality of IT applications.

7. Development Method

In the Java-EE sample, 57 applications re-ported using Agile methods, 60 applica-tions reported using Waterfall methods, 46 applications reported using a mix of Agile and Waterfall methods, and 21 projects re-ported using no method. Figure 4 displays

the distributions of scores for applications using different development methods. Sig-nificant differences were observed among development methods on all health factors. The strongest differences between develop-ment methods were observed for Robustness and Changeability where they accounted for 14% to 15% of the variation in scores. Smaller but significant differences were ob-served for Security (9%) and for Performance and Transferability (5% to 6%). Additional statistical analyses confirmed that these dif-ferences were accounted for by higher health factor scores for the mix of Agile and Wa-terfall compared to scores for Agile or Wa-terfall approaches used separately, or not us-ing a method. Scores for Agile and Waterfall methods did not differ significantly from each other on any of the health factors.

These results indicate that for large business critical applications the mix of Agile and Waterfall methods produces greater struc-tural quality than other development meth-ods, although for Performance and Transfer-ability these differences are not large. The superiority of the Agile/Waterfall mix sug-gests that for these types of applications the greater emphasis on up front design leads to better scores for the Robustness, Change-ability, and Security of the application, and to a smaller extent for its Performance and Transferability.


11

Health factor scores for the mix of Agile and Waterfall are higher than for Agile or Waterfall ap-proaches used separately.

4

The CRASH Report - 2011/12 • Summary of Key FindingsThe CRASH Report - 2014 | Summary of Key Findings

12

Figure 4. Health factor distributions for development methodsScores for Agile and Waterfall methods did not differ sig-nificantly from each other on any of the health factors.

4,0

3,5

3,0

2,5

2,0

Perf

orm

ance

4,0

3,5

3,0

2,5

2,0

Chan

geab

ility

4,0

3,5

3,0

2,5

2,0

Robu

stne

ss

4,0

3,5

3,0

2,5

2,0

Secu

rity

4,0

3,5

3,0

2,5

2,0

Tran

sfer

abili

ty

Agile Mix None WaterfallAgile Mix None Waterfall

Agile Mix None Waterfall Agile Mix None Waterfall

Agile Mix None Waterfall Agile Mix None Waterfall

4,0

3,5

3,0

2,5

2,0

Tota

l Qua

lity

Inde

x

5


8. Summary of Factors Affecting Health Factor Scores

The strongest impacts on structural quality among all the demographic factors were for process maturity. CMMI Level 1 organiza-tions produced applications with substan-tially lower scores on all health factors than applications developed in CMMI Level 2 or Level 3 organizations. The impact of de-velopment method was not as great as that for process maturity, but still impacted all health factors. The mix of Agile and Water-fall methods produced higher scores than ei-ther Agile of Waterfall methods used alone, suggesting that for business critical applica-tions the value of agile and iterative methods is enhanced by the up-front architectural and design activity that characterized Wa-terfall methods. The choice to develop ap-plications in-house versus outsourcing them or onshore versus offshore had little to no significant effect on health factor scores.

These results provide definitive evidence for the value of process matu-rity and achieving the right mix of agile and waterfall methods in deve-loping business critical applications. Structural quality on large business critical applications was best achieved when impediments to disciplined software engineering practices were removed and early design activity was integrated with short cycle releases.

These results provide definitive evidence for the value of process maturity and achieving the right mix of Agile and Waterfall meth-ods in developing business critical applica-tions. Structural quality on large business critical applications was best achieved when impediments to disciplined software engi-neering practices were removed and early design activity was integrated with short cycle release. The full CRASH report will include benchmark data for each language on each health factor, as well as data on the most frequently violated rules of good ar-chitectural and coding practice for each lan-guage.


13

6


AuthorsDr. Bill Curtis Senior Vice President and Chief Scientist

Dr. Bill Curtis is best known for leading development of the Capability Maturity Model (CMM). Prior to joining CAST, Bill was a Co-Founder of TeraQuest, the global leader in CMM-based services. Earlier he di-rected the Software Process Program at the Software En-

gineering Institute (SEI) at Carnegie Mellon University. He also directed research at MCC, at ITT‘s Programming Technology Center, in GE Space Division, and at the Uni-versity of Washington. He is a Fellow of the Institute of Electrical and Electronics Engineers for his contributions to software process improvement and measurement.


Stanislas Duthoit Research Associate

Alexandra Szynkarski Product Marketing Manager

Lev Lesokhin Executive Vice President, Strategy and Analytics

Alexandra Szynkarski is the product manager for CAST Highlight and research as-sistant in CAST Research Labs. Her research interests include comparative analysis of application technical qua-lity across technologies and industry verticals, as well as measuring technical debt.

Alexandra received an MS in international business admi-nistration from the Institut d’Administration des Entre-prises in Nice, France.

Lev Lesokhin is responsible for CAST‘s market deve-lopment, strategy, thought leadership and product marketing. He has a passi-on for customer success, building the ecosystem, and advancing the state of the art in business techno-logy. Lev comes to CAST

from SAP’s Global SME organization. Prior to SAP, Lev was a leader in the research team at the Corporate Execu-tive Board, a consultant at McKinsey and a member of technical staff at the MITRE corporation. Lev holds an MBA from the Sloan School of Management at MIT and a B.S.E.E. from Rensselaer Polytechnic Institute.

14

Stanislas Duthoit is a re-search associate in CAST Re-search Labs. His interests in-clude structural quality benchmarks and measuring software performance trends on the global application de-velopment community. Sta-nislas holds a MSc in Civil Systems and a Certificate in

Management of Technology from UC Berkeley.

crash report 2014

Software

summary of factors

structural quality measurement

structural quality flaws

summary of key findings3

summary of key findings1

brief summary

health factors robustness

health factor scores