how to accurately estimate the size and effort of your software testing (1)

HOW TO ACCURATELY ESTIMATE THE SIZE

AND EFFORT OF YOUR SOFTWARE TESTING

A QASymphony White Paper

Dr. Vu Nguyen

1 QASymphony

Introduction

Software testing is a vital part of any software development. New software applications and updates to existing software must be tested before release to ensure that they are fit for purpose. Quality assurance is the ultimate goal. Sadly, QA is rarely afforded the time or budget required for a zero defect release.

Despite the fact that the testing process typically amounts to between 10% and 25% of the overall project effort, QA departments are often starved of the time and resources required to do the job right. The roots of this problem can typically be traced back to inadequate estimations at the outset.

There are numerous methodologies that can be employed to estimate the requirements for an overall software development, but they invariably fail to take into account a number of specific testing requirements. The result is often an inaccurate measurement of the budget needed to test efficiently and meet the schedule. This creates tension between developers and the QA team, as management applies pressure to meet unrealistic milestones.

In this white paper we will examine the difficulties incumbent in estimating the overall scope of testing for a given project. We will then go on to highlight an easily applicable method for arriving at accurate estimates of the size and effort of software testing that is required for any software development.

2 QASymphony

The problem with inaccurate estimates

The pressure to ship new software products on time, or roll out new features via updates, has never been greater. For any software development to meet its schedule, the original planning and estimation must be as accurate as possible. Since this data is often used to secure investment, for long term planning, and when bidding for projects, serious inaccuracies can cause untold problems down the line.

QA departments are rarely given an input at the outset and may be asked to sign off on unrealistic estimates when the project is already underway. The end result is invariably a failure to deliver the required standard of software on time. Either the release date has to be pushed back, as the schedule is extended to allow for more testing, or the software is released with a lot of bugs.

When QA departments are asked to provide estimates they tend to rely on experience and gut feeling, because there is no concrete method of measuring the test size and effort required. Inaccurate estimates at the outset lead to customer frustration down the line, increased costs, and a negative impact on your company’s reputation.

What can testing groups do in the face of weekly builds? How do they explain the rising cost in resource as regression testing kicks in and grows in scope with each new build? They need a method of estimating accurately.

The challenge of estimating accurately

A number of factors impact on software testing and that makes it difficult to arrive at accurate estimations.

• You can’t just consider the overall number of people on the QA team; you have to think about their individual capabilities. New team members will not have the same capacity as seasoned pros.

• Every piece of software is different. You have to take into account the requirements of the technology employed, and the processes and environments involved.

• It is hard to quantify software quality and apply equal standards to attributes like functionality, reliability, usability, efficiency, and scalability.

• There are many different types of testing activities, so a comprehensive test plan has a lot of bases to cover.

• Software requirements may be inaccurate, inadequate, or subject to change during the development process.

3 QASymphony

Introducing qEstimation

The qEstimation process can be applied for any project and it is designed to be accessible for all members of the QA team. It empowers QA professionals to identify problems with the schedule at the outset and to provide management with a solid analysis of the time and resources required for successful software testing. The more this process is used, the more accurate it will become. It is also flexible and scalable, so you can adapt it to your needs.

By using the qEstimation tool testing groups can quickly establish the resources needed for a given a project and calculate the velocity of the test team. There’s actually no need to understand why it works, you can simply plug in the pertinent numbers and within three iterations, as the data is fed back into the system, you’ll get accurate estimations on the effort required going forward. Any independent test team can adopt this tool and see concrete results within three to six weeks, making it possible to accurately report on the velocity of the test team and the resources needed to meet the schedule.

How does it work? An overview

To estimate the size of software testing we start with a Test Case Point Analysis (TCPA). Four elements are taken into account -‐ checkpoint, precondition, test data, and test type (we will go into this in more detail in the next section). This allows us to establish the scope of the test cycle and to estimate the effort needed to complete it. As the test cases are completed the actual effort and time required is plugged back into the model, as you can see in this diagram.

4 QASymphony

Using this method ensures that your estimates will grow more and more accurate as you feedback the actual data from testing and analyze any inaccuracies. It enables you to build a repository of experience and draw on that knowledge to make informed estimations for new test cycles.

Calculating Test Case Point Analysis (TCPA)

Drawing on the idea that Function Point Analysis (FPA) and Use Case Point Analysis (UCPA) are based on requirements and use cases, test cases are the primary influence on testing and so they are used as the core input of TCPA. The first step is to use your test cases to calculate the Test Case Point (TCP) count. As we mentioned before there are four factors to consider -‐ checkpoint, precondition, test data, and test type. The first three relate to the overall size of the test case and the fourth, test type, acts as a weighting mechanism that takes into account the varying complexity of different test types.

Checkpoint – an individual condition where the tester can verify that the software behaves as expected. An individual test case could have one or several checkpoints. Each checkpoint should be counted as one TCP.

Precondition – what is the overhead in terms of preparation before the test case can be executed? This is broken into four possible levels of complexity, as highlighted in the table below.

5 QASymphony

Complexity Level Description

None

The precondition is not applicable or important to execute the test case. Or, the precondition is just reused from the previous test case to continue the current test case.

Low The condition for executing the test case is available with some simple modifications required. Or, some simple set-‐up steps are needed.

Medium

Some explicit preparation is needed to execute the test case. The condition for executing is available with some additional modifications required. Or, some additional set-‐up steps are needed.

High Heavy hardware and/or software configurations are needed to execute the test case.

Test data – this is the data that is required to execute the test case and it may need to be generated by test scripts or sourced from previous tests. Once again, there are four possible levels of complexity as detailed in this table.

Complexity Level Description

None No test data preparation is needed.

Low

Simple test data is needed and can be created during the test case execution time. Or, the test case uses a slightly modified version of existing test data and requires little or no effort to modify the test data.

Medium Test data is deliberately prepared in advance with extra effort to ensure its completeness, comprehensiveness, and consistency.

High

Test data is prepared in advance with considerable effort to ensure its completeness, comprehensiveness, and consistency. This could include using support tools to generate data and a database to store and manage test data. Scripts may be required to generate test data.

The ratings of the complexity levels of the precondition and test data also has to be assigned a TCP value and so here is a possible set of values for weighting. The values are something you’ll want to fine tune for your own specific project.

6 QASymphony

Complexity Level Number of TCP for Precondition

Number of TCP for Test Data

None 0 0

Low 1 1

Medium 3 3

High 5 6

Test type – since there are many different kinds of tests with varying degrees of complexity you’ll want to weight them accordingly. The actual weights you apply will depend on your own approach, but this table can serve as an example. Using the user interface and functional testing as your baseline, you can add other types of test as needed and weight them accordingly; this is not an exhaustive list.

Types of Test Description Weight

User interface and functional testing

User interface and functional testing is considered baseline.

1.0

API API testing verifies the accuracy of the interfaces in providing services.

1.22

Database Testing the accuracy of database scripts, data integrity and/or data migration.

1.36

Security Testing how well the system handles hacking attacks, unauthorized and unauthenticated accesses.

1.39

Installation Testing of full, partial, or upgrade install/uninstall processes of the software.

1.09

Networking Testing the communications among entities via networks.

1.27

Algorithm and computation

Verifying algorithms and computations designed and implemented in the system.

1.38

Usability testing Testing the friendliness, ease of use, and other usability attributes of the system.

1.12

Performance (manual) Verifying whether the system meets performance requirements, assuming that the test is done manually.

1.33

Recovery testing Recovery testing verifies the accuracy of the recovery process to recover from system crashes and other

1.07

7 QASymphony

Types of Test Description Weight

errors.

Compatibility testing Testing whether the software is compatible with other elements of a system with which it should operate, e.g., browsers, operating systems, or hardware.

1.01

How do you establish weights?

There’s a chance that the QA manager will want to take responsibility for determining the weights that should be applied for the precondition complexity, test data complexity, and test types, but we recommend widening the net and polling the opinions of experienced testers and software engineers. Ensure that you explain the TCPA system fully and then issue a survey. The results can be used to establish an average.

How do you estimate the effort?

The longer a team works together and the more projects they work on, the easier it becomes to estimate their average work rate. You’ll need to have an idea of how many test cases can be completed per hour, per worker. For the qEstimation system to work you need to understand the relationship between tester minutes and hours and the TCP count.

We recommend two methods for estimating the effort required to complete your test cases. The one you choose will depend largely on the amount of historical data you can draw on. In simple terms, the more test cases you complete, the more accurately you can determine how long a new test case will take to complete.

Using a productivity index you can draw on the results of past tests, or you can devise an experiment to produce useful data. If you have previous test cycles or data from completed projects that were similar then you’ll be able to make fairly accurate estimations of the effort required.

You might also consider linear regression analysis. This method allows you to establish estimation ranges and give a confidence level for the estimate, but it can only be employed if you have five similar test cycles or projects to draw on.

Once you can work out a ratio for how long a tester takes to complete a TCP then you can accurately assess the resources required to complete the test cycle in the time frame specified.

8 QASymphony

A learning model

As each test cycle is completed you need to compare the actual results with the estimates and account for any inaccuracies. This information can be used to make the qEstimation model more accurate in future. It is fast to apply and it grows more accurate as the pool of data it can draw on grows deeper. It also provides a rigid structure for analysis and estimation that any tester can employ, so it can reduce the burden on experienced testers who may overestimate the effort required in order to avoid failing to meet the schedule.

Reap the rewards

The ability to accurately estimate the size and effort of your software testing provides a number of obvious benefits. For working out budgets and staffing levels, this kind of information is vital. If your test team is presented with an unrealistic schedule you can highlight the shortfall with concrete data based on past performance. The end result is an achievable schedule and a better quality of software on release day. The qEstimation tool doesn’t require corporate wide adoption. Any independent test team can plug in the relevant data and arm themselves with accurate estimations on the resources they require and the velocity of the test team.

It is a well-‐established rule in software testing that the earlier a defect is found, the cheaper it is to fix. The same principle can be applied to test estimations -‐ accuracy at the start will save you a great deal of time and resource down the line.