bad metric, bad!

41
NOTICE: Proprietary and Confidential This material is proprietary to Centric Consulting, LLC. It contains trade secrets and information which is solely the property of Centric Consulting, LLC. This material is solely for the Client’s internal use. This material shall not be used, reproduced, copied, disclosed, transmitted, in whole or in part, without the express consent of Centric Consulting, LLC. © 2013 Centric Consulting, LLC. All rights reserved Bad Metric. Bad! Teaching an old dog, nothing new

Upload: centric-consulting

Post on 16-Jul-2015

222 views

Category:

Technology


9 download

TRANSCRIPT

Page 1: Bad metric, bad!

NOTICE: Proprietary and Confidential

This material is proprietary to Centric Consulting, LLC. It contains trade secrets and information which is solely the property of Centric Consulting, LLC. This material is

solely for the Client’s internal use. This material shall not be used, reproduced, copied, disclosed, transmitted, in whole or in part, without the express consent of Centric

Consulting, LLC.

© 2013 Centric Consulting, LLC. All rights reserved

Bad Metric. Bad!

Teaching an old dog, nothing new

Page 2: Bad metric, bad!
Page 3: Bad metric, bad!
Page 4: Bad metric, bad!

What are some typical metrics that you measure?

Page 5: Bad metric, bad!

Other Examples of Software Testing Metrics

• Test Case Counts by Execution Status

• Test Case Percentages by Execution Status

• Test Case Execution Status Trend

• Test Case Status Planned vs Executed

• Test Case Coverage

• Test Case Status vs Coverage

• Test Case First Run Failure Counts

• Test Case Re– Run Counts

Test Cases

• Automation Index (Percent Automatable)

• Automation Progress

• Automation Test Coverage

Automation extras

Page 6: Bad metric, bad!

More Examples of Software Testing Metrics

• Defect Counts by Status

• Defect Counts by Priority

• Defect Status Trend

• Defect Density

• Defect Remove Efficiency

• Defect Leakage

• Average Defect Response Time

Defects

• Requirements Volatility Index

• Testing Process Efficiency

Other

Page 7: Bad metric, bad!

Common Themes

Counts

Metric (Counts/Counts)

Trends

Page 8: Bad metric, bad!

Other Examples of Software Testing Metrics

• Test Case Counts by Execution Status – Count

• Test Case Percentages by Execution Status – Count

• Test Case Execution Status Trend – Trend

• Test Case Executed vs Planned – Metric and Trend

• Test Case Coverage – Metric

• Test Case Status vs Coverage – Metric

• Test Case First Run Failure Counts – Count

• Test Case Re– Run Counts – Count

Test Cases

• Automation Index (Percent Automatable) – Metric

• Automation Progress – Count

• Automation Test Coverage – Metric

Automation extras

Page 9: Bad metric, bad!

More Examples of Software Testing Metrics

• Defect Counts by Status – Count

• Defect Counts by Priority – Count

• Defect Status Trend – Trend

• Defect Density – Metric

• Defect Remove Efficiency – Metric

• Defect Leakage – Metric

• Average Defect Response Time – Trend

Defects

• Requirements Volatility Index – Metric

• Testing Process Efficiency – Metric

Other

Page 10: Bad metric, bad!

The Problem We Typically Face?

They Fail to Communicate

• Present data instead of information

• Offer no interpretation, allow user to draw own conclusion

They Are Often Inaccurate

• The act of measuring lacks of consistency

• The measures themselves have inherent variability

• No one reports margin of errors

They Do Not Measure a Control

• Can’t make decision based on number

• The measurement isn’t a lever to introduce change

They Are Not Tied to Organizational Objectives

• No threshold set for desired goal

• No action or consequence if not achieved

Page 11: Bad metric, bad!

Counting

Page 12: Bad metric, bad!

Counting

Page 13: Bad metric, bad!
Page 14: Bad metric, bad!

Exercise #1

1. Need 3 volunteers

2. Assume 1 scoop equals 1 days worth of testing effort

3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are

bugs

4. Take a scoop

5. How many tests did you execute?

6. Based on how many tests you ran, how many more scoops

do you need to execute the rest (there are 180 total)?

Page 15: Bad metric, bad!

Exercise #1 Questions

• Was the same scoop used? Were the results the

same?

• Was there variability in the number of tests run in

each scoop. • Is that typical in testing?

• Was there variability in the estimate of the number

of tests left?• Is this similar to guessing how much time is effort is left in

a test cycle?

• Are these numbers reliable?• Are they repeatable?

Page 16: Bad metric, bad!

Exercise #2

1. Need 3 volunteers

2. Assume 1 scoop equals 1 days worth of testing effort

3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are

severe)

4. Take a scoop

5. How many tests did you execute?

6. How many defects did you find?

7. Based on how many tests you ran, how many more scoops do you need

to execute the rest?

8. Based on how much effort you put in, how many more scoops do you

need to find the rest of the defects?

Page 17: Bad metric, bad!

Exercise #2 Questions

• Was the same scoop used? Were the results the same?

• With an estimate of the number of tests remaining, is it reasonable to

estimate the number of defects will be found?• Do people ask you to guess this type of information?

• If you know how many tests (Starbursts) are left and how many man-

hours you will use (scoop size), can you estimate how many scoops are

needed to execute all tests (find all Starbursts)?• Is it accurate? Is it close enough?

• Are these numbers reliable?• Are they repeatable?

• Does encountering defects (M&M’s) reveal anything about the overall

quality (how many M&M’s exist, or what it’ll take to find them)?

Page 18: Bad metric, bad!

Challenges with Counting

Label does not equal content

Inherent variability

Not evenly spaced

Lacks reference for context

Lack of consistency

Page 19: Bad metric, bad!

Metrics (Measure over Measure)

Page 20: Bad metric, bad!

Sampling

Target Population

Matched Samples

Independent Samples

Random Sampling

Simple Random Sampling

Stratified Sampling

Cluster Sampling

Quota Sampling

Spatial Sampling

Sampling Variability

Standard Error

Bias

Precision

For each population there are many possible samples. A sample statistic

gives information about a corresponding population parameter

Page 21: Bad metric, bad!

Sampling in Testing

Does testing use sampling?

Consider in most corporate environments:

• We never test the entire application

• It is not realistically possible to find

every defect

• So, does testing use sampling?

Page 22: Bad metric, bad!

Ponder this as we discuss the next section…

Is Testing a Methodical Defect Searching Activity?

Page 23: Bad metric, bad!

Sampling

Remember, We can’t test everything – not enough time/people/budget

So, which sample approach better approximates an actual measure (e.g.

dots per sq. inch?)

5.25 dots/sq. in. 6.5 dots/sq. in.

Page 24: Bad metric, bad!

Ponder this as we discuss the next section…

Is Testing a Methodical Defect Searching Activity?

Page 25: Bad metric, bad!

Sampling

Which sample approach better approximates an actual measure (e.g. dots

per sq. inch?)

• What is more accurate, random or methodical searching?

5.25 dots/sq. in. 6.5 dots/sq. in.

4.95 dots/sq. in. 6.3 dots/sq. in.

There are actually 6.6

dots/sq. in.

Page 26: Bad metric, bad!

Exercise #3

Page 27: Bad metric, bad!

Exercise #3

1. Need 3 volunteers

2. Assume 1 scoop equals 1 days worth of testing effort

3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are

severe)

4. Each volunteer grab 1 scoop of candy

5. How many (total) tests did you execute?

6. How many (total) defects did you find?

7. Log results

8. Repeat 2 more times

Page 28: Bad metric, bad!

Exercise #3 Questions

• Does this graph represent anything useful?

• Does a trend line help or mean anything?

• Is it possible or reasonable to estimate the # of

defects you’ll see based on the number of

tests, from even 9 samples?

• Compare scoop 1 to scoop 9 – does any scoop

seem to be a reasonable estimate?

Page 29: Bad metric, bad!

Challenges with Metrics (Measure over Measure)

Implied Derivations and Forecasting

Counts over Counts

Denominator Rules

Implies Velocity

Measure over Measure

Page 30: Bad metric, bad!

Trends

Page 31: Bad metric, bad!

Trend

Trend is a change in a measure (or metric) over time interval.

Has three components

Direction/Movement Speed/Size Cause (Implied)

Page 32: Bad metric, bad!
Page 33: Bad metric, bad!

Exercise #4

1. Need 3 volunteers

2. Assume 1 scoop equals 1 days worth of testing effort

3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are

severe)

4. Each volunteer grab 1 scoop of candy

5. How many of EACH type of tests did you execute?

6. How many of EACH type of defect did you find?

7. Log results

8. Repeat 2 more times

Page 34: Bad metric, bad!

Exercise #4 Questions

• Does the graph line represent any information of value?

• Is there assurance (control) that simply taking a scoop (e.g.

executing tests in a given day) will result in defects being

found?

• Is the shape of the defect cumulative line representative of

anything?

• If we only look at scoops 1-3 or 7-9, does it tell us anything or

mislead us?

• What if we took 2 scoops per day (added a tester – but still

counted as 1 day), would that affect anything how things

look?

• Is M&M’s per scoop or M&M’s per skittles/starbursts mean

anything?

Page 35: Bad metric, bad!

Challenges with Trends

Affected by challenges of counting

Affected by challenges of metrics

Time Based Series

Intervals and Activity Pause

Page 36: Bad metric, bad!
Page 37: Bad metric, bad!

Purpose of Metrics

Measure of Performance

Conformance to Best Practice

Deviation from Goal

Page 38: Bad metric, bad!

Issues affecting purpose

Misaligned with strategy

Using metrics as outputs only

Too many metrics

Ease of measure does not equal importance

Lack of context

Limited dimensions

Lack behavioral aspects

Page 39: Bad metric, bad!

Changing the World

Page 40: Bad metric, bad!

How to Leverage Metrics

Explicitly link metrics to goals

Use trends over absolute numbers

Use shorter tracking periods

Change metrics when they stop driving change

Account for error and confidence

Page 41: Bad metric, bad!

Q&A

Joseph Ours

Email:

[email protected]

Company Website:

https://centricconsulting.com/technol

ogy-solutions/software-quality-

assurance-and-testing/

Twitter:

@justjoehere

LinkedIN:

www.linkedin.com/josephours

Personal Blog:

http://josephours.blogspot.com