testing and measurement: transparency, guidance and ... 20… · scrum at a glance • developed...

0

Testing and Measurement:

Transparency, Guidance and Tension

2012

Without the knowledge of cause and

effect, there is no way to improve.

Cory Doctorow

Tom Cagley

[email protected]

(440) 668-5717 – Cell

Software Process and Measurement Podcast (www.spamcast.net)

@tcagley - Twitter

mailto:[email protected]

http://www.spamcast.net/

2

Dashboard

Why

Tale

Methods

Exercise

Agile

Metrics Food For

Thought

Questions

3

A GOAL OR A TOOL

Why Measure

4

Why Measure

• Software test measurement provides

visibility into product and process quality.

– Test metrics are facts to help a team, coach

or project manager understand their current

position.

– Provide an objective measure of the

effectiveness and efficiency of testing.

– Identifies risk areas

5

Awareness Attention Action

Measurement Provides Awareness

• Knowing something is only the beginning of an

equation that culminates in action.

• Awareness helps provide a spotlight of attention

that filters unwanted information.

• If you are not able or interested in taking action,

what value is there in knowing?

6

• Good measurement requires an internal

conversation about testing performance

and goals.

• Measurement helps make better

decisions.

• If you don't use performance

measurement data, then do not bother

measuring at all. Measurement shelfware

wastes money.

A Dialog About Measurement

7

The Basics

• All numbers begin life as good and useful tools.

• Act as a steward of the numbers and a high priest of information.

• Information rich world but very little structure and few filters.

• Metrics are a tool to fight Continuous Partial Attention (mostly).

• Defining what is important to the organization and what to measure is critically important and is not a truly democratic event.

8

Effective Measurement Is A Balance

• Effort

• Cost

• Interference

• Conflict

• Insights

• Actions

• Change

• Transformation

9

A METRICS CAUTIONARY

TALE

Good Numbers Go Bad

10

A Cautionary Tale

• Message Messes

• Mistakes, Errors and the Like

• Lack of Understanding

• Lack of Use or Poor Usage

11

Message Messes: Communication

• Communication

– Field of Dreams:

Un-validated vision

– Monologues:

Unidirectional

communication

– Beliefs: Powerful filter

“A metric program is ineffective

unless it is linked directly to a set

of goals, mission or vision.”

Michael Sanders,

Past CIO of Transamerica Life

12

The Solution is . . .

Solutions:

• Validate how goals have been

translated into metrics.

• Actively address misinformation

and interpretations by providing

neutral interpretations.

• Involve measurement users in

analysis, interpretation

(take a page out of Agile).

13

Message Mess: Misinterpretation

• Misinterpretation

– Lack of education and

knowledge: Missing

the know how or frame

of reference to analyze

or interpret metrics

data

– Active dissemination:

Making up a story . . .

“It is of paramount importance for

an organization to ensure that the

proper decisions are made based

upon the best (most accurate)

data available.”

David Herron,

David Consulting Group

14

Metrics Analysis Should not be About Spin . . .

Solutions:

• Communicate and educate

early and often.

• Keep interpretations neutral.

• Deal with misinterpretations

as soon as they are identified.

15

Mistakes: Collection

• Collection

– Errors: Collecting the

wrong information or not

collecting it at all (including

all of their variants

‘garbage in . . .’).

– Erratic: Collecting data

when the urge (or boss)

hits you.

“In order to capture metrics the

procedures, guidelines,

templates, and databases need to

be in sync

with the standard practices.”

Donna Hook, Medco

16

Make Sure You Collect the Right Stuff . . .

Solutions:

• Do not sweep problems under

the rug.

• Make sure data specification is

at a level that will allow you to

actually collect it correctly.

• Collect data as specified in the

measurement plan.

17

Mistakes: Math

• Math

– Errors: Mistakes happen

in logical definition of

the metrics, the data

collected and the

equations.

– Knowledge: • “I never took statistics in

college but the graph looks

pretty” syndrome

• “I can prove anything by

number syndrome”

• “Equation exhaustion”

“We accidentally used $88

instead of $66. Now our

stakeholders ask for a second

source.”

Rob Hoerr, Fidelity Information Services

18

One Plus One Equals . . .

Solutions:

• Have a professional

statistician (or trained

amateur) review your

graphs, equations,

assumptions and logical

use of math.

19

Understanding: None

• None

– Assuming: Don’t make

the assumption that users

and providers understand

what is being measured

and know how to use

the measures, or the data

are contributing will

be used for.

“What many people fail to realize is

that metrics need to be tracked over

time and ANALYZED.”

Iris Trout, Bloomberg

20

Educate Your Users . . .

Solutions:

• Communicate and EDUCATE

early and often. Remembers

awareness does not equate to

knowledge.

• Use case studies to train your

users and contributors.

21

Understanding: Complexity

• Complexity

– Overly Simple: Failure

to ensure explanative

power of the measures

and metrics

– Overly Complex:

“Baffle them with

bulls…syndrome”

“Keep it simple enough. Ensure

that the measurement is

meaningful to both process actors

and managers.”

S. J. Sanders, BOT International

22

Complexity Leads to Uncertainty . . .

Solutions:

• Leverage a statistician to

review your graphs and

equations. Are they

explanative? Are they predictive?

• Simplify, simplify then do it

again, but do not violate step one.

• Involve metrics users in the analysis

of the metrics and measures.

23

Usage: Poor

• Poor Relevance – Culture Mismatches:

Measures and metrics linked to unrelated items combined with the logical backing of studious people result in interesting ramifications. Type of mismatches include:

–Types of work

–People

–Goals

“Good numbers go bad when, middle

management dictates what the metrics

program will report in order to improve

or make a less then stellar project look

better then it really is.”

RaeAnn Hamilton, TDS Telecom

24

Make the Measures Relevant . . .

Solutions:

• Review the measures you are

accumulating and reporting.

Ask the following questions:

– Do the measures work for all

the types of work they are measuring?

– Do the measures address all of the roles

that participate in the work?

– Are the measures and metrics aligned?

25

Usage: Poor

• Poor – Punishment: Leads to

risk aversion or worse.

– Report Cards: Comprehensive or jaded view?

– Politics: “We can’t challenge that, it is too political”.

– Pressure: Incent behavior outside of the norm?

“One characteristic of a bad

metrics program is to ‘Beat

People Up’ for reporting true

performance.”

Miranda Mason, Accenture

26

Your Report Card is in the Mail . . .

Solutions:

• Recognize the level of

granularity each measure

can be used to (person,

team or organization) explain

performance.

• Create balanced scorecards

linked to business goals and

the behavior you want people

to exhibit.

27

Usage: Lack

• Lack of:

– Action: Data is collected,

then nothing. Someone

forgot that the “Some

action is required here”

block on the flow chart.

– Follow Through: Inaction

is a message about the

perceived importance of

the behavior being

measured.

“The key is that there is no point

to taking measurements and

deriving metrics if they aren’t part

of some (planned) decision

making process.”

Jack Hoffman, Wolthers Kluwer

28

Now That You Have Data. . .

Solutions:

• USE THE DATA YOU

ARE COLLECTING.

• Report the measures

(publicly) and take actions

based on the data.

29

• Detail

– Too Much: ‘If a little

information is good,

then more is better’

– Information Overload:

Contributing to

organizational ADD

(Continuous Partial

Attention, CPA).

“I believe regular customer review

and involvement will significantly

increase the chance that we will

provide what our customer(s)

want.”

Mark Smith, Diebold

Usage: Detail

30

What Level of Detail . . .

Solutions:

• Link information needs to

your organizations business

goals as an anchor to collect

and report only what is

required.

• Discipline is required to make

business goals an anchor.

31

End Of The Cautionary Tale

• Good numbers do not go bad all by themselves. Problems can stem from many sources including: – Lack of planning,

– Lack of knowledge (on many fronts),

– Politics and/or

– Mere mistakes.

In the short term, it might be easier to let your numbers go bad, even to run wild.

Do not wake up late one night to see your numbers featured in a good-numbers-go-bad infomercial.

32

WATERFALL AND V-MODEL

AND ALTERNATIVES

Methods

33

The Venerable V-Model

• Developed for managing!

• Salient Features

– A simplification of the

complexity of development

– Waterfall ish

– Provides a uniform procedure

for development

– Shows the decomposition of

requirements paired with

verification and validation

steps

34

Scrum At A Glance

• Developed for managing!


– Product and Sprint

backlogs (card wall)

– Iterative planning

– Time box

– Standup Meetings

– Definition of done

– Sprint Demos

– Retrospectives

– Self directed and organized

teams

• Roles

– Scrum Master

– Product Owner

– Scrum Team

35

Test Driven Development At A Glance

Write Test

Code Function

Run Test Refactor

• Salient Features:

– Focus on meeting need

expressed in test through

code

– Reduces technical debt via

refactoring

– Testing is an enforced

regression test

– Iterative

• Role(s):

– Developer

36

xP At A Glance

• xP is a full methodology:

Project management and

technical components


– Release plan

– Iteration plan

– Acceptance test

– Stand Up meeting

– Pair negotiation

• Salient Features, part 2

– Unit test

– Pair programming

– CODE

• Roles – Customer

– Developer

– Tracker

– Coach

37

Kanban At A Glance

• Pull methodology

• Salient Features:

– Work in process limits

– Continuous Flow

– Iterative releases

– Self directing

– Identifies when work sits

– Does not require that you

change your methodology

• Example

Kanban can be used any

where so lets talk about it

more =>

38

What Is Kanban

• Kanban means “visual card”

• Originally part of the Toyota Production system,

Kanban cards limit the amount of inventory tied

up in “work in progress” on a manufacturing floor

• Excess inventory is waste, time spent producing

it is time that could be expended elsewhere

• Kanban represent how WIP is allowed in a

system

39

But . . .

But we are doing

incremental development

and testing. Shouldn’t

everything be fine?

40

Common Time Box Development Issues

• Short time-boxes force development items to be

smaller

• Smaller development items are often too small to be

valuable and difficult to identify

• Quality of requirements suffers as analysts rush to

prepare for upcoming cycles

• Quality of current development suffers when busy

analysts are unable to inspect software or answer

questions during development

• Quality often suffers as testers race to complete work

late in the development time-box

41

Inside an iteration, effort across roles is uneven

Development work often continues throughout a cycle while

testing starts late and never seems to get enough time

41

Specialization

can

exacerbate

this issue

42

Why Do Anything?

43

Why

• Using a Kanban approach in software shifts from

time-boxed iterations in favor of focusing on

continuous flow.

44

Characteristics of Kanban

• Visualize the workflow

• Limit WIP (work in progress)

• Measure & optimize flow

• Explicit policies (definition of Done, WIP

limits, etc)?

45

Flow: More or Less Complex

http://flic.kr/p/4yvFP2 http://flic.kr/p/7xD6wF

46

A Demonstration

This simple process flow has four core

steps:

1.analysis

2.development

3.test

4.deployment

47

Work In Process Limits

• WIP is Work In Progress. Work that has been

started but not yet completed (acronym: WIP). In

Kanban, each column has a limit of allowed

work. It's called WIP limit. How to create a WIP

limit:

– Ask how many people do you have?

– Start low and observe bottlenecks

– Use size (function points or story points)

Goal:

Reduce

WIP

48

Making Explicit Policies

• Kanban Board Itself

• Work in Process Limits

• Coding Standards

• Definition of Done

• Exit Criteria

Making policies explicit is a

key enabler of evolutionary,

collaborative change in a

Kanban System.

Why does this work?

Your role is to:

Observe, Challenge and Change

49

EXERCISE

Using Kanban To Talk About Testing Flow

50

Instructions

http://www.paperairplanes.co.uk/planes.php

http://www.paperairplanes.co.uk/planes.php

51

Instructions Iteration One. . .

• Split into groups of 13

– 3 Four person development teams

– 1 Independent tester

• Developers

– One person will make fold one

– One person will make fold two

– One Person will make fold three

– One Person will make fold four and add a

logo then hand the plane to the tester

• Independent Tester

– Inspect (no defects) and test. Test in the

order the plane is received.

52

Instructions Iteration Two. . .

• Split into groups of 15

– 3 Four person development teams

– 1 Independent tester

• One team . . .

– One person will make fold one

– One person will make fold two

– One Person will make fold three

– One Person will make fold four and add a

logo then hand the plane to the tester

– One person inspect (no defects) and test.

Test in the order the plane is received.

53

Soooooooooo

• Which method delivered more totally

completed planes?

• Which method had more work in process

when the iteration was completed?

• Was the quality the same for both?

• Who should answer the “quality” question?

54

AGILE INFLUENCED METRICS

Quality

55

Individual

Organization

Operations

Strategy

Trigger

Ability

Motivation

To transform

culture, each of

the

components

must be

addressed

Requirements For Change

56

Agile Influenced Metrics

• Reinforce desired behavior

• Focus on results

• Measure trends

• Easy to collect

• Includes context

• Creates real conversation

• ONLY WHAT IS ABSOLUTELY NEEDED

57

No Measure To Rule Them All

Productivity

Quality

Predictability

Value Performance

58

A Palette

• ROI (value)

• Customer Satisfaction (value)

• Tests Passed (quality)

• Defect Count (quality)

• Technical Debt (quality)

• Work-In-Process (productivity)

Pallet suggests only

using those that are

absolutely needed.

59

ROI

A performance measure used to evaluate

the efficiency of an investment or to

compare the efficiency of a number of

different investments.

61

Indexed Customer Satisfaction

• A customer satisfaction survey contains a number of

questions aimed at assessing the customers’ view of the

team and the work the team is delivering. While the

questions are mostly qualitative and individual answers

subjective, surveys taken regularly and across a variety

of participants will yield useful trends.

• Recommended frequency is with or after each release of

the product. If releases are infrequent then perhaps each

sprint, but too often will become annoying for the

participants. Around every 6 weeks to 3 months feels

about right.

63

Automated Tests

• Cumulative number of passing tests over time is a proxy

for quality based on the theory that running more (i.e.

passing) tests reflects a positive measure of quality.

• The higher the percentage of automated testing the

better.

65

Defect Counts

• Two metrics to track quality improvement: – Post-Sprint Defect Arrival (leading indicator)

– Post-Release Defect Arrival (lagging indicator)

• Plotted against time (Sprints). The trending of

these curves independently and relative to one

another can tell us a great deal about the effect

of the team’s attempts to improve quality ab

initio and about it’s ability to drive down the open

defect count.

67

Technical Debt

• Technical debt is ‘undone’ work. In other words

work that will have to be done in the future in

order to bring the code base or other required

deliverables to the required quality level.

• Technical debt is always added to the Product

Backlog and is prioritized by the Product Owner

and team in relation to all the other work.

• The units are story or function points (as for

other Backlog items) and these are tracked

against time (Sprints).

69

WIP

• Work In Process is a lean metric that helps a

team track whether they are working

collaboratively or not. The idea in an Agile team

is for the whole team, as far as is reasonably

possible, to collaborate on a single work item

until it is ‘done’. This increases the rate of

output, quality and cross-learning. It decreases

the risk of unfinished items at the end of the

Sprint, which results in waste.

71

Your Pallet

• Keep it simple!

• Focused on organizational goals.

• Measure only what you want to predict

and ONLY if you are going to do

something about what you will learn!

72

NUMBERS, PROCESS AND

FOOD FOR THOUGHT

Software Quality

73

Basic Definitions

SOFTWARE Software that combines the characteristic of low

QUALITY defect rates and high user satisfaction

USER Clients who are pleased with a vendor’s products, quality

SATISFACTION levels, ease of use, and support

DEFECT Technologies that minimize the risk of making errors

PREVENTION in software deliverables

DEFECT Activities that find and correct defects in software

REMOVAL deliverables

BAD FIXES Secondary defects injected as a by product of defect

repairs

74

Basic Software Quality Pallet

• Defect Potentials – requirements errors, design errors, code

errors, document errors, bad fix errors, test plan errors, and test case errors

• Defects Removed – by origin of defects – before testing – during testing – during deployment

• Defect Removal Efficiency

– ratio of development defects to customer defects

• Defect Severity Levels (Valid defects) – fatal, serious, minor, cosmetic

75

• Duplicate Defects

• Invalid Defects

• Defect Removal Effort and Costs – preparation – execution – repairs and rework – effort on duplicates and invalids

• Supplemental Quality Metrics

– complexity – test case volumes – test case coverage – IBM’s orthogonal defect categories

Basic Software Quality Pallet (cont.)

76

• Standard Cost of Quality – Prevention – Appraisal – Failures

• Revised Software Cost of Quality – Defect Prevention – Non-Test Defect Removal – Testing Defect Removal – Post-Release Defect Removal

• Error-Prone Module Effort – Identification – Removal or redevelopment – repairs and rework

Basic Software Quality Pallet (cont.)

77

Hazardous Quality Definitions (Jones)

Does Quality means conformance to requirements

Requirements contain > 15% of software errors.

Requirements grow at > 2% per month.

Do you conform to requirements errors?

Do you conform to totally new requirements?

Whose requirements are you trying to satisfy?

78

Hazardous Quality Definitions (Cont)

Cost per Defect

• Approaches infinity as defects near zero

• Conceals real economic value of quality

79

Graph Of Major Software Risks Schedule

Slippage

Unplanned

Changes

Inadequate

Defect

Removal

High

Defect

Levels Lawsuit

for Breach

Of Contract

Poor

User

Satisfaction

Deferred

Functions

Cost

Overrun

80

Quality Method Effectiveness And Costs - Jones

METHOD EFFECTIVENESS COSTS • Formal Inspections Very High High

• Defect Estimation Very High Low

• Defect Tracking High Low

• Formal Testing High High

• QA Organization High High

• Independent audits High High

• JAD and QFD High Low

• Prototyping High Low

• Test Case Tools High Medium

• Change Tracking High Medium

• Informal Walkthroughs Moderate Medium

• Informal Testing Moderate Medium

• TQM Moderate Medium

• ISO 9000-9004 Marginal High

81

Percentage Of Software Effort By Tasks

Size in Mgt./ Defect

Function Points Support Removal Paperwork Coding Total

2,580 16% 31% 29% 24% 100%

1,280 15% 29% 26% 30% 100%

640 14% 27% 23% 36% 100%

320 13% 25% 20% 42% 100%

160 12% 23% 17% 48% 100%

80 11% 21% 14% 54% 100%

40 10% 19% 11% 60% 100%

How do we reduce the amount of effort required defects?

Create less?

Make them easier to find?

82

How Quality Affects Software Costs

Requirements Design Coding Testing Maintenance

COST

TIME

Pathological

Healthy

Poor quality is cheaper until

the end of the coding phase.

After that, high quality is

cheaper.

83

Defect Prevention Methods

Excellent Good

Not

Applicable Fair Poor

Excellent Excellent Fair

Not

Applicable Excellent

Fair Good Excellent Fair Fair

Fair Good Fair Fair Fair

Excellent Excellent Excellent Excellent Good

Good Excellent Fair Poor Good

JAD’s A

Prototypes A

Structured

Methods

Design

Tools Blueprints &

Reusable Code A

QFD

Requirements Design Code Document Performance

Defects Defects Defects Defects Defects

84

Defect Prevention Methods

Inspections/

Pair Programming A

Stories A

Testing

(all forms) A

Correctness

Proofs A

Requirements Design Code Document Performance

Defects Defects Defects Defects Defects

Fair Excellent Excellent Good Fair

Not

Good Fair Fair Applicable Good

Poor Poor Good Fair Excellent

Poor Poor Good Fair Poor

85

Defect Removal Efficiency

• Removal efficiency is the most important quality measure

Defects found • Removal efficiency = Defects present

• “Defects present” is the critical parameter

86

Defect Removal Efficiency

1 2 3 4 5 6 7 8 9 10 Defects

First operation 6 defects from 10 or 60% efficiency

Second operation 2 defects from 4 or 50% efficiency

Cumulative efficiency 8 defects from 10 or 80% efficiency

Defect removal

efficiency = Percentage of defects removed by a single

level of review, inspection or test

Cumulative defect

removal efficiency = Percentage of defects removed by a series

of reviews, inspections or tests

87

Defect Removal Efficiency Example

DEVELOPMENT DEFECTS

Inspections 500

Testing 400

Subtotal 900

USER-REPORTED DEFECTS IN FIRST 90 DAYS

Valid unique 100

TOTAL DEFECT VOLUME

Defect totals 1000 REMOVAL EFFICIENCY Dev. (900) / Total (1000) = 90%

88

Number Of Testing Stages, Testing Effort, And Defect Removal Efficiency - Jones

Number of Percent of Effort Cumulative Defect

Testing Stages Devoted to Testing Removal Efficiency

1 testing stage 10% 50%

2 testing stages 15% 60%




6 testing stages* 33%* 85%*




*Note: Six test stages, 33% costs, and 85% removal efficiency are U.S. averages.

Type and

number of

iterations

89

Conclusions and Guidelines

• Measurements Should Directly Support Organizational

Goals

• Start With Results Measurements

• Use Real time Scorecards

• The Measurement Set Must Be Internally Consistent

• Measurements Should Be Focused on Project

Performance and Processes; Not on Individual

Performance

90

Other Considerations

• You Must Deal With The Issue of Size

• Foundation Systems Are Important

• Collect And Use Data Close to Its Source

• Measurements Should Be a Natural By-product of Work

• Actual Use Is the Fastest Way to Identify Inaccuracies

and Get Them Corrected

• Those Who Provide Data Should Use and Verify The

Data

91

Know Where You Are Beginning

• Baseline: A Point-In-Time Inventory of an Organization

Which Includes One or a Combination of:

Software Size

Processes

Capabilities

Hardware

• Benchmark: A Comparison of Performance against

Standard. Typical Standards Include:

Industry Averages

Baselines

92

Questions . . .

Tom Cagley

[email protected]

(440) 668-5717 – Cell

Software Process and Measurement Podcast (www.spamcast.net)

“Call me, beep me if ya wanna reach me

When ya wanna page me it's okay

I just can't wait until I hear my cell phone ring

Doesn't matter if it's day or night

Everything's gonna be alright

Whenever you need me baby

Call me, beep me if ya wanna reach me”

- Kim Possible Theme Song

mailto:[email protected]

testing and measurement: transparency, guidance and ... 20… · scrum at a glance • developed...

Documents