reliability and quality - predicting post-release defects using pre-release field testing results

Post on 13-Jan-2015

696 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Paper : Predicting Post-release Defects Using Pre-release Field Testing Results Authors : Foutse Khomh, Brian Chan, Ying Zou, Anand Sinha and Dave Dietz Session: Research Track Session 9: Reliability and Quality

TRANSCRIPT

PREDICTING POST-RELEASE

DEFECTS USING PRE-RELEASE

FIELD TESTING RESULTS

Foutse Khomh, Brian Foutse Khomh, Brian

Chan, Ying Zou

Anand Sinha, Dave Dietz

FIELD TESTING CYCLE

Field testing is important to improve the quality of

an application before release.2

MEAN TIME BETWEEN

FAILURE

Mean Time Between Failures (MTBF) is frequently

used to gauge the reliability of the application.

3

Applications with a low MTBF are undesirable

since they would have a higher number of

defects

AVERAGE USAGE TIME

� AVT is the average time that a user actively uses the

application.

� The AVT can be longer than the period of field testing.

4

A longer AVT indicates that an application is

reliable and a user tends to use the application

longer.

PROBLEM STATEMENT

� MTBF and AVT cannot capture the whole

pattern of failure occurrences in the field testing

of an application.

5

The reliability of A and B is very different.

METRICS

� We propose three metrics that capture additional

patterns of failure occurrences:

� TTFF: the average length of usage time before

the occurrence of the first failure, the occurrence of the first failure,

� FAR: the failure accumulation rating to gauge

the spread of failures to the majority of users,

and

� OFR: the overall failure ratio that captures

daily rates of failures. 6

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA

% of users reporting failures

70

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB

% of users reporting failures

80

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0

0.1

0.2

0.3

0.4

0.5

VersionA VersionB

% of users

reporting failures

9

0

1 2 3 4 5 6 7 8 9 1011121314

Daysreporting failures

TTFF produces high scores for applications

where the majority of users experience the

first failure late.

AVERAGE TIME TO FIRST

FAILURE (TTFF)

0.3

0.35

0.4

0.45

VersionA VersionB

% of users reporting failures

100

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

TTFFB = 3.56

TTFFA = 6.11

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

% of users reporting

110

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

Number of unique failures

% of users reporting

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

% of users reporting

120

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionB

% of users reporting

Number of unique failures

FAILURE ACCUMULATION

RATING (FAR)

0.2

0.4

0.6

0.8

1

% of users reporting

13

0

1 3 5 7 9 11 13% of users reportingNumber of unique failures

The FAR metric produces high scores for

applications where the majority of users report

a very low numbers of failures.

FAILURE ACCUMULATION

RATING (FAR)

0.6

0.7

0.8

0.9

1

FARB = 4.97

% of users reporting

140

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

VersionA

VersionBFARA = 6.97

Number of unique failures

% of users reporting

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA

% of users reporting failures

150

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA VersionB

% of users reporting failures

160

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

% of users reporting failures

Days

OVERALL FAILURE RATING

(OFR)

0

0.1

0.2

0.3

0.4

VersionA VersionB

% of users reporting

failures

17

0

1 3 5 7 9 11 13

% of users reporting

failures

Days

The OFR metric produces high scores for

applications with fewer users reporting

failures overall.

OVERALL FAILURE RATING

(OFR)

0.25

0.3

0.35

VersionA VersionB OFRB = 0.78

OFRA = 0.93

% of users reporting failures

180

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Days

% of users reporting failures

CASE STUDY

We analyze 18 versions of an enterprise software

application

� Overall 2,546 users were involved in the field

testingtesting

� The testing period lasted 30 days

19

SPEARMAN CORRELATION

OF THE METRICS

TTFF FAR OFR AVT MTBF

TTFF 1 0.09 -0.08 -0.31 -0.08

20

FAR 0.09 1 0.07 0.33 -0.24

OFR -0.08 0.07 1 0.39 -0.54

AVT -0.31 0.33 0.39 1 -0.3

MTBF -0.08 -0.24 -0.54 -0.3 1

INDEPENDENCY AMONG

PROPOSED METRICS

0.4

0.6

0.8

1

TTFF

21

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

PC1 PC2 PC3 PC4

TTFF

FAR

OFR

MTBF

PREDICTIVE POWER FOR

POST-RELEASE DEFECTS

0.1

0.12

0.14

square

220

0.02

0.04

0.06

0.08

TTFF FAR OFR AVT MTBF

6 months

1 year

2 years

Metrics

Marginal R-square

PRECISION OF PREDICTIONS

WITH ALL FIVE METRICS

60

70

80

90

100

230

10

20

30

40

50

60

5 10 15 20 25 30

6 months

1 year

2 years

Precision (%)

Number of testing days

CONCLUSION

� TTFF, FAR, and OFR complement the traditional

MTBF and AVT in predicting the number of post-

release defects

� Provide faster predictions of the number of post-� Provide faster predictions of the number of post-

release defects with good precision within just 5

days of a pre-release testing period

� It takes MTBF up to 25 days to predict the

number of post-release defects

24

25

top related