testing and measurement: transparency, guidance and ... 20… · scrum at a glance • developed...
TRANSCRIPT
0
Testing and Measurement:
Transparency, Guidance and Tension
2012
Without the knowledge of cause and
effect, there is no way to improve.
Cory Doctorow
Tom Cagley
(440) 668-5717 – Cell
Software Process and Measurement Podcast (www.spamcast.net)
@tcagley - Twitter
2
Dashboard
Why
Tale
Methods
Exercise
Agile
Metrics Food For
Thought
Questions
3
A GOAL OR A TOOL
Why Measure
4
Why Measure
• Software test measurement provides
visibility into product and process quality.
– Test metrics are facts to help a team, coach
or project manager understand their current
position.
– Provide an objective measure of the
effectiveness and efficiency of testing.
– Identifies risk areas
5
Awareness Attention Action
Measurement Provides Awareness
• Knowing something is only the beginning of an
equation that culminates in action.
• Awareness helps provide a spotlight of attention
that filters unwanted information.
• If you are not able or interested in taking action,
what value is there in knowing?
6
• Good measurement requires an internal
conversation about testing performance
and goals.
• Measurement helps make better
decisions.
• If you don't use performance
measurement data, then do not bother
measuring at all. Measurement shelfware
wastes money.
A Dialog About Measurement
7
The Basics
• All numbers begin life as good and useful tools.
• Act as a steward of the numbers and a high priest of information.
• Information rich world but very little structure and few filters.
• Metrics are a tool to fight Continuous Partial Attention (mostly).
• Defining what is important to the organization and what to measure is critically important and is not a truly democratic event.
8
Effective Measurement Is A Balance
• Effort
• Cost
• Interference
• Conflict
• Insights
• Actions
• Change
• Transformation
9
A METRICS CAUTIONARY
TALE
Good Numbers Go Bad
10
A Cautionary Tale
• Message Messes
• Mistakes, Errors and the Like
• Lack of Understanding
• Lack of Use or Poor Usage
11
Message Messes: Communication
• Communication
– Field of Dreams:
Un-validated vision
– Monologues:
Unidirectional
communication
– Beliefs: Powerful filter
“A metric program is ineffective
unless it is linked directly to a set
of goals, mission or vision.”
Michael Sanders,
Past CIO of Transamerica Life
12
The Solution is . . .
Solutions:
• Validate how goals have been
translated into metrics.
• Actively address misinformation
and interpretations by providing
neutral interpretations.
• Involve measurement users in
analysis, interpretation
(take a page out of Agile).
13
Message Mess: Misinterpretation
• Misinterpretation
– Lack of education and
knowledge: Missing
the know how or frame
of reference to analyze
or interpret metrics
data
– Active dissemination:
Making up a story . . .
“It is of paramount importance for
an organization to ensure that the
proper decisions are made based
upon the best (most accurate)
data available.”
David Herron,
David Consulting Group
14
Metrics Analysis Should not be About Spin . . .
Solutions:
• Communicate and educate
early and often.
• Keep interpretations neutral.
• Deal with misinterpretations
as soon as they are identified.
15
Mistakes: Collection
• Collection
– Errors: Collecting the
wrong information or not
collecting it at all (including
all of their variants
‘garbage in . . .’).
– Erratic: Collecting data
when the urge (or boss)
hits you.
“In order to capture metrics the
procedures, guidelines,
templates, and databases need to
be in sync
with the standard practices.”
Donna Hook, Medco
16
Make Sure You Collect the Right Stuff . . .
Solutions:
• Do not sweep problems under
the rug.
• Make sure data specification is
at a level that will allow you to
actually collect it correctly.
• Collect data as specified in the
measurement plan.
17
Mistakes: Math
• Math
– Errors: Mistakes happen
in logical definition of
the metrics, the data
collected and the
equations.
– Knowledge: • “I never took statistics in
college but the graph looks
pretty” syndrome
• “I can prove anything by
number syndrome”
• “Equation exhaustion”
“We accidentally used $88
instead of $66. Now our
stakeholders ask for a second
source.”
Rob Hoerr, Fidelity Information Services
18
One Plus One Equals . . .
Solutions:
• Have a professional
statistician (or trained
amateur) review your
graphs, equations,
assumptions and logical
use of math.
19
Understanding: None
• None
– Assuming: Don’t make
the assumption that users
and providers understand
what is being measured
and know how to use
the measures, or the data
are contributing will
be used for.
“What many people fail to realize is
that metrics need to be tracked over
time and ANALYZED.”
Iris Trout, Bloomberg
20
Educate Your Users . . .
Solutions:
• Communicate and EDUCATE
early and often. Remembers
awareness does not equate to
knowledge.
• Use case studies to train your
users and contributors.
21
Understanding: Complexity
• Complexity
– Overly Simple: Failure
to ensure explanative
power of the measures
and metrics
– Overly Complex:
“Baffle them with
bulls…syndrome”
“Keep it simple enough. Ensure
that the measurement is
meaningful to both process actors
and managers.”
S. J. Sanders, BOT International
22
Complexity Leads to Uncertainty . . .
Solutions:
• Leverage a statistician to
review your graphs and
equations. Are they
explanative? Are they predictive?
• Simplify, simplify then do it
again, but do not violate step one.
• Involve metrics users in the analysis
of the metrics and measures.
23
Usage: Poor
• Poor Relevance – Culture Mismatches:
Measures and metrics linked to unrelated items combined with the logical backing of studious people result in interesting ramifications. Type of mismatches include:
–Types of work
–People
–Goals
“Good numbers go bad when, middle
management dictates what the metrics
program will report in order to improve
or make a less then stellar project look
better then it really is.”
RaeAnn Hamilton, TDS Telecom
24
Make the Measures Relevant . . .
Solutions:
• Review the measures you are
accumulating and reporting.
Ask the following questions:
– Do the measures work for all
the types of work they are measuring?
– Do the measures address all of the roles
that participate in the work?
– Are the measures and metrics aligned?
25
Usage: Poor
• Poor – Punishment: Leads to
risk aversion or worse.
– Report Cards: Comprehensive or jaded view?
– Politics: “We can’t challenge that, it is too political”.
– Pressure: Incent behavior outside of the norm?
“One characteristic of a bad
metrics program is to ‘Beat
People Up’ for reporting true
performance.”
Miranda Mason, Accenture
26
Your Report Card is in the Mail . . .
Solutions:
• Recognize the level of
granularity each measure
can be used to (person,
team or organization) explain
performance.
• Create balanced scorecards
linked to business goals and
the behavior you want people
to exhibit.
27
Usage: Lack
• Lack of:
– Action: Data is collected,
then nothing. Someone
forgot that the “Some
action is required here”
block on the flow chart.
– Follow Through: Inaction
is a message about the
perceived importance of
the behavior being
measured.
“The key is that there is no point
to taking measurements and
deriving metrics if they aren’t part
of some (planned) decision
making process.”
Jack Hoffman, Wolthers Kluwer
28
Now That You Have Data. . .
Solutions:
• USE THE DATA YOU
ARE COLLECTING.
• Report the measures
(publicly) and take actions
based on the data.
29
• Detail
– Too Much: ‘If a little
information is good,
then more is better’
– Information Overload:
Contributing to
organizational ADD
(Continuous Partial
Attention, CPA).
“I believe regular customer review
and involvement will significantly
increase the chance that we will
provide what our customer(s)
want.”
Mark Smith, Diebold
Usage: Detail
30
What Level of Detail . . .
Solutions:
• Link information needs to
your organizations business
goals as an anchor to collect
and report only what is
required.
• Discipline is required to make
business goals an anchor.
31
End Of The Cautionary Tale
• Good numbers do not go bad all by themselves. Problems can stem from many sources including: – Lack of planning,
– Lack of knowledge (on many fronts),
– Politics and/or
– Mere mistakes.
In the short term, it might be easier to let your numbers go bad, even to run wild.
Do not wake up late one night to see your numbers featured in a good-numbers-go-bad infomercial.
32
WATERFALL AND V-MODEL
AND ALTERNATIVES
Methods
33
The Venerable V-Model
• Developed for managing!
• Salient Features
– A simplification of the
complexity of development
– Waterfall ish
– Provides a uniform procedure
for development
– Shows the decomposition of
requirements paired with
verification and validation
steps
34
Scrum At A Glance
• Developed for managing!
• Salient Features
– Product and Sprint
backlogs (card wall)
– Iterative planning
– Time box
– Standup Meetings
– Definition of done
– Sprint Demos
– Retrospectives
– Self directed and organized
teams
• Roles
– Scrum Master
– Product Owner
– Scrum Team
35
Test Driven Development At A Glance
Write Test
Code Function
Run Test Refactor
• Salient Features:
– Focus on meeting need
expressed in test through
code
– Reduces technical debt via
refactoring
– Testing is an enforced
regression test
– Iterative
• Role(s):
– Developer
36
xP At A Glance
• xP is a full methodology:
Project management and
technical components
• Salient Features
– Release plan
– Iteration plan
– Acceptance test
– Stand Up meeting
– Pair negotiation
• Salient Features, part 2
– Unit test
– Pair programming
– CODE
• Roles – Customer
– Developer
– Tracker
– Coach
37
Kanban At A Glance
• Pull methodology
• Salient Features:
– Work in process limits
– Continuous Flow
– Iterative releases
– Self directing
– Identifies when work sits
– Does not require that you
change your methodology
• Example
Kanban can be used any
where so lets talk about it
more =>
38
What Is Kanban
• Kanban means “visual card”
• Originally part of the Toyota Production system,
Kanban cards limit the amount of inventory tied
up in “work in progress” on a manufacturing floor
• Excess inventory is waste, time spent producing
it is time that could be expended elsewhere
• Kanban represent how WIP is allowed in a
system
39
But . . .
But we are doing
incremental development
and testing. Shouldn’t
everything be fine?
40
Common Time Box Development Issues
• Short time-boxes force development items to be
smaller
• Smaller development items are often too small to be
valuable and difficult to identify
• Quality of requirements suffers as analysts rush to
prepare for upcoming cycles
• Quality of current development suffers when busy
analysts are unable to inspect software or answer
questions during development
• Quality often suffers as testers race to complete work
late in the development time-box
41
Inside an iteration, effort across roles is uneven
Development work often continues throughout a cycle while
testing starts late and never seems to get enough time
41
Specialization
can
exacerbate
this issue
42
Why Do Anything?
43
Why
• Using a Kanban approach in software shifts from
time-boxed iterations in favor of focusing on
continuous flow.
44
Characteristics of Kanban
• Visualize the workflow
• Limit WIP (work in progress)
• Measure & optimize flow
• Explicit policies (definition of Done, WIP
limits, etc)?
45
Flow: More or Less Complex
http://flic.kr/p/4yvFP2 http://flic.kr/p/7xD6wF
46
A Demonstration
This simple process flow has four core
steps:
1.analysis
2.development
3.test
4.deployment
47
Work In Process Limits
• WIP is Work In Progress. Work that has been
started but not yet completed (acronym: WIP). In
Kanban, each column has a limit of allowed
work. It's called WIP limit. How to create a WIP
limit:
– Ask how many people do you have?
– Start low and observe bottlenecks
– Use size (function points or story points)
Goal:
Reduce
WIP
48
Making Explicit Policies
• Kanban Board Itself
• Work in Process Limits
• Coding Standards
• Definition of Done
• Exit Criteria
Making policies explicit is a
key enabler of evolutionary,
collaborative change in a
Kanban System.
Why does this work?
Your role is to:
Observe, Challenge and Change
49
EXERCISE
Using Kanban To Talk About Testing Flow
50
Instructions
http://www.paperairplanes.co.uk/planes.php
51
Instructions Iteration One. . .
• Split into groups of 13
– 3 Four person development teams
– 1 Independent tester
• Developers
– One person will make fold one
– One person will make fold two
– One Person will make fold three
– One Person will make fold four and add a
logo then hand the plane to the tester
• Independent Tester
– Inspect (no defects) and test. Test in the
order the plane is received.
52
Instructions Iteration Two. . .
• Split into groups of 15
– 3 Four person development teams
– 1 Independent tester
• One team . . .
– One person will make fold one
– One person will make fold two
– One Person will make fold three
– One Person will make fold four and add a
logo then hand the plane to the tester
– One person inspect (no defects) and test.
Test in the order the plane is received.
53
Soooooooooo
• Which method delivered more totally
completed planes?
• Which method had more work in process
when the iteration was completed?
• Was the quality the same for both?
• Who should answer the “quality” question?
54
AGILE INFLUENCED METRICS
Quality
55
Individual
Organization
Operations
Strategy
Trigger
Ability
Motivation
To transform
culture, each of
the
components
must be
addressed
Requirements For Change
56
Agile Influenced Metrics
• Reinforce desired behavior
• Focus on results
• Measure trends
• Easy to collect
• Includes context
• Creates real conversation
• ONLY WHAT IS ABSOLUTELY NEEDED
57
No Measure To Rule Them All
Productivity
Quality
Predictability
Value Performance
58
A Palette
• ROI (value)
• Customer Satisfaction (value)
• Tests Passed (quality)
• Defect Count (quality)
• Technical Debt (quality)
• Work-In-Process (productivity)
Pallet suggests only
using those that are
absolutely needed.
59
ROI
A performance measure used to evaluate
the efficiency of an investment or to
compare the efficiency of a number of
different investments.
61
Indexed Customer Satisfaction
• A customer satisfaction survey contains a number of
questions aimed at assessing the customers’ view of the
team and the work the team is delivering. While the
questions are mostly qualitative and individual answers
subjective, surveys taken regularly and across a variety
of participants will yield useful trends.
• Recommended frequency is with or after each release of
the product. If releases are infrequent then perhaps each
sprint, but too often will become annoying for the
participants. Around every 6 weeks to 3 months feels
about right.
63
Automated Tests
• Cumulative number of passing tests over time is a proxy
for quality based on the theory that running more (i.e.
passing) tests reflects a positive measure of quality.
• The higher the percentage of automated testing the
better.
65
Defect Counts
• Two metrics to track quality improvement: – Post-Sprint Defect Arrival (leading indicator)
– Post-Release Defect Arrival (lagging indicator)
• Plotted against time (Sprints). The trending of
these curves independently and relative to one
another can tell us a great deal about the effect
of the team’s attempts to improve quality ab
initio and about it’s ability to drive down the open
defect count.
67
Technical Debt
• Technical debt is ‘undone’ work. In other words
work that will have to be done in the future in
order to bring the code base or other required
deliverables to the required quality level.
• Technical debt is always added to the Product
Backlog and is prioritized by the Product Owner
and team in relation to all the other work.
• The units are story or function points (as for
other Backlog items) and these are tracked
against time (Sprints).
69
WIP
• Work In Process is a lean metric that helps a
team track whether they are working
collaboratively or not. The idea in an Agile team
is for the whole team, as far as is reasonably
possible, to collaborate on a single work item
until it is ‘done’. This increases the rate of
output, quality and cross-learning. It decreases
the risk of unfinished items at the end of the
Sprint, which results in waste.
71
Your Pallet
• Keep it simple!
• Focused on organizational goals.
• Measure only what you want to predict
and ONLY if you are going to do
something about what you will learn!
72
NUMBERS, PROCESS AND
FOOD FOR THOUGHT
Software Quality
73
Basic Definitions
SOFTWARE Software that combines the characteristic of low
QUALITY defect rates and high user satisfaction
USER Clients who are pleased with a vendor’s products, quality
SATISFACTION levels, ease of use, and support
DEFECT Technologies that minimize the risk of making errors
PREVENTION in software deliverables
DEFECT Activities that find and correct defects in software
REMOVAL deliverables
BAD FIXES Secondary defects injected as a by product of defect
repairs
74
Basic Software Quality Pallet
• Defect Potentials – requirements errors, design errors, code
errors, document errors, bad fix errors, test plan errors, and test case errors
• Defects Removed – by origin of defects – before testing – during testing – during deployment
• Defect Removal Efficiency
– ratio of development defects to customer defects
• Defect Severity Levels (Valid defects) – fatal, serious, minor, cosmetic
75
• Duplicate Defects
• Invalid Defects
• Defect Removal Effort and Costs – preparation – execution – repairs and rework – effort on duplicates and invalids
• Supplemental Quality Metrics
– complexity – test case volumes – test case coverage – IBM’s orthogonal defect categories
Basic Software Quality Pallet (cont.)
76
• Standard Cost of Quality – Prevention – Appraisal – Failures
• Revised Software Cost of Quality – Defect Prevention – Non-Test Defect Removal – Testing Defect Removal – Post-Release Defect Removal
• Error-Prone Module Effort – Identification – Removal or redevelopment – repairs and rework
Basic Software Quality Pallet (cont.)
77
Hazardous Quality Definitions (Jones)
Does Quality means conformance to requirements
Requirements contain > 15% of software errors.
Requirements grow at > 2% per month.
Do you conform to requirements errors?
Do you conform to totally new requirements?
Whose requirements are you trying to satisfy?
78
Hazardous Quality Definitions (Cont)
Cost per Defect
• Approaches infinity as defects near zero
• Conceals real economic value of quality
79
Graph Of Major Software Risks Schedule
Slippage
Unplanned
Changes
Inadequate
Defect
Removal
High
Defect
Levels Lawsuit
for Breach
Of Contract
Poor
User
Satisfaction
Deferred
Functions
Cost
Overrun
80
Quality Method Effectiveness And Costs - Jones
METHOD EFFECTIVENESS COSTS • Formal Inspections Very High High
• Defect Estimation Very High Low
• Defect Tracking High Low
• Formal Testing High High
• QA Organization High High
• Independent audits High High
• JAD and QFD High Low
• Prototyping High Low
• Test Case Tools High Medium
• Change Tracking High Medium
• Informal Walkthroughs Moderate Medium
• Informal Testing Moderate Medium
• TQM Moderate Medium
• ISO 9000-9004 Marginal High
81
Percentage Of Software Effort By Tasks
Size in Mgt./ Defect
Function Points Support Removal Paperwork Coding Total
2,580 16% 31% 29% 24% 100%
1,280 15% 29% 26% 30% 100%
640 14% 27% 23% 36% 100%
320 13% 25% 20% 42% 100%
160 12% 23% 17% 48% 100%
80 11% 21% 14% 54% 100%
40 10% 19% 11% 60% 100%
How do we reduce the amount of effort required defects?
Create less?
Make them easier to find?
82
How Quality Affects Software Costs
Requirements Design Coding Testing Maintenance
COST
TIME
Pathological
Healthy
Poor quality is cheaper until
the end of the coding phase.
After that, high quality is
cheaper.
83
Defect Prevention Methods
Excellent Good
Not
Applicable Fair Poor
Excellent Excellent Fair
Not
Applicable Excellent
Fair Good Excellent Fair Fair
Fair Good Fair Fair Fair
Excellent Excellent Excellent Excellent Good
Good Excellent Fair Poor Good
JAD’s A
Prototypes A
Structured
Methods
Design
Tools Blueprints &
Reusable Code A
QFD
Requirements Design Code Document Performance
Defects Defects Defects Defects Defects
84
Defect Prevention Methods
Inspections/
Pair Programming A
Stories A
Testing
(all forms) A
Correctness
Proofs A
Requirements Design Code Document Performance
Defects Defects Defects Defects Defects
Fair Excellent Excellent Good Fair
Not
Good Fair Fair Applicable Good
Poor Poor Good Fair Excellent
Poor Poor Good Fair Poor
85
Defect Removal Efficiency
• Removal efficiency is the most important quality measure
Defects found • Removal efficiency = Defects present
• “Defects present” is the critical parameter
86
Defect Removal Efficiency
1 2 3 4 5 6 7 8 9 10 Defects
First operation 6 defects from 10 or 60% efficiency
Second operation 2 defects from 4 or 50% efficiency
Cumulative efficiency 8 defects from 10 or 80% efficiency
Defect removal
efficiency = Percentage of defects removed by a single
level of review, inspection or test
Cumulative defect
removal efficiency = Percentage of defects removed by a series
of reviews, inspections or tests
87
Defect Removal Efficiency Example
DEVELOPMENT DEFECTS
Inspections 500
Testing 400
Subtotal 900
USER-REPORTED DEFECTS IN FIRST 90 DAYS
Valid unique 100
TOTAL DEFECT VOLUME
Defect totals 1000 REMOVAL EFFICIENCY Dev. (900) / Total (1000) = 90%
88
Number Of Testing Stages, Testing Effort, And Defect Removal Efficiency - Jones
Number of Percent of Effort Cumulative Defect
Testing Stages Devoted to Testing Removal Efficiency
1 testing stage 10% 50%
2 testing stages 15% 60%
3 testing stages 20% 70%
4 testing stages 25% 75%
5 testing stages 30% 80%
6 testing stages* 33%* 85%*
7 testing stages 36% 87%
8 testing stages 39% 90%
9 testing stages 42% 92%
*Note: Six test stages, 33% costs, and 85% removal efficiency are U.S. averages.
Type and
number of
iterations
89
Conclusions and Guidelines
• Measurements Should Directly Support Organizational
Goals
• Start With Results Measurements
• Use Real time Scorecards
• The Measurement Set Must Be Internally Consistent
• Measurements Should Be Focused on Project
Performance and Processes; Not on Individual
Performance
90
Other Considerations
• You Must Deal With The Issue of Size
• Foundation Systems Are Important
• Collect And Use Data Close to Its Source
• Measurements Should Be a Natural By-product of Work
• Actual Use Is the Fastest Way to Identify Inaccuracies
and Get Them Corrected
• Those Who Provide Data Should Use and Verify The
Data
91
Know Where You Are Beginning
• Baseline: A Point-In-Time Inventory of an Organization
Which Includes One or a Combination of:
Software Size
Processes
Capabilities
Hardware
• Benchmark: A Comparison of Performance against
Standard. Typical Standards Include:
Industry Averages
Baselines
92
Questions . . .
Tom Cagley
(440) 668-5717 – Cell
Software Process and Measurement Podcast (www.spamcast.net)
“Call me, beep me if ya wanna reach me
When ya wanna page me it's okay
I just can't wait until I hear my cell phone ring
Doesn't matter if it's day or night
Everything's gonna be alright
Whenever you need me baby
Call me, beep me if ya wanna reach me”
- Kim Possible Theme Song