2008 - building software: an artful science [ppt]

Michael Hogarth, MD

Building Software: An Artful Science

Software development is risky

IBM’s Consulting Group survey :

55% of the software developed cost more that projected

68% took longer to complete than predicted.

88% had to be substantially redesigned.

Standish Group Study of 8,380 software projects (1996):

31% of software projects were canceled before they were completed

53% of those are completed cost an average of 189% of their original estimates.

42% of completed projects - have their original set of proposed features and functions.

9% - completed on time and on budget.

“To err is human, to really foul things up requires a computer”

Standish Group report 2006

19% of projects were outright failures

35% could be categorized as successes (better than 1996, but not great)

46% of projects were “challenged” (either had cost overruns or delays, or both)

McDonald’s gets McFried

McDonald’s “Innovate” Project

$500 million spent for nothing....

Objective --

“McDonald's planned to spend $1 billion over five years to tie all its operations in to a real-time digital network. Eventually, executives in company headquarters would have been able to see how soda dispensers and frying machines in every store were perfect.

Why was it scrubbed?

“information systems don't scrub toilets and they don't fry potatoes” Barrett, 2003.

http://www.baselinemag.com/c/a/Projects-Supply-Chain/McDonalds-McBusted/

FBI’s “Virtual Case File”

2003 - Virtual Case File - networked system for tracking criminal cases

SAIC spent months writing over 730,000 lines of computer code

Found to have hundreds of software problems during testing

$170 million dollar project was cancelled -- SAIC reaped more than $100 million

Problems

delayed by over a year. In 2004, the system was 1/10th of the functionality intended and thus largely unusable after $170 spent

SAIC delivered what FBI requested, the requesting was flawed, poorly planned, not tied to scheduled deliverables

Now what?

Lockheed Martin given contract for $305 million tied to benchmarks

http://www.washingtonpost.com/wp-dyn/content/article/2006/08/17/AR2006081701485_pf.html

Causes of the VCF Failure

Changing requirements (conceived before 9/11, after 9/11 requirements were altered significantly)

14 different managers over the project lifetime (2 years)

Poor oversight by the primary ‘owner’ of the project (FBI) - did not oversee construction closely

Did not pay attention to new, better commercial products -- kept head in the sand because it “had to be built fast”

Hardware was purchased first, waiting on software (common problem) -- if software is delayed, hardware is “legacy” quickly

http://www.inf.ed.ac.uk/teaching/courses/seoc2/2004_2005/slides/failures.pdf

Washington State Licensing Dept

1990 - Washington State License Application Mitigation Project

$41.8 million over 5 years to automate the State’s vehicle registration and license renewal process

1993 - after $51 million, the original design and requirements were expected to be obsolete when finally built

1997 - Washington legislature pulled the plug -- $40 million wasted

Causes

ambitious

lack of early deliverables

development split between in-house and contractor

J Sainsbury IT failure

UK food retailer, J. Sainsbury, invested in an automated supply-chain management system

System did not perform the functions as needed

As a result, merchandise was stuck in company warehouses and not getting to the stores

Company added 3,000 additional clerks to stock the shelves manually

They killed the project after spending $526 million.....

“to err is human, to really foul up requires a root password.”anonymous

Other IT nightmares1999 - $125 million NASA Mars Climate Orbiter lost in space due to a data conversion error...

Feb 2003 - U.S. Treasury Dept. mailed 50,000 Social Security checks without beneficiary names. Checks had to be ‘cancelled’ and reissued...

2004-2005 - UK Inland Revenue (IRS) software errors contribute to a $3.45billion tax-credit overpayment

May 2005 - Toyota had to install a software fix on 20,000 hybrid Prius vehicles due to problems with invalid engine warning lights. It is estimated that the automobile industry spends $2-$3billion/year fixing software problems

Sept 2006 - A U.S. Government student loan service software error made public the personal data of 21,000 borrowers on it’s web site

2008 - new Terminal 5 at Heathrow Airport -New automated baggage routing system leads to over 20,000 bags being put in temporary storage...

does it really matter?

Software bugs can kill...

http://www.wired.com/software/coolapps/news/2005/11/69355

When users inadvertently cause disaster

http://www.wired.com/software/coolapps/news/2005/11/69355?currentPage=2

How does this happen?

Many of the runaway projects are ‘overly ambitious’ -- a major issue (senior management has unrealistic expectations of what can be done)

Most projects failed because of multiple problems/issues, not one.

Most problems/issues were management related.

In spite of obvious signs of the runaway software project (72% of project members are aware), only 19% of senior management is aware

Risk management, an important part of identifying trouble and managing it, was NOT done in any fashion in 55% of major runaway projects.

Causes of failure

Project objectives not fully specified -- 51%

Bad planning and estimating -- 48%

Technology is new to the organization -- 45%

Inadequate/no project management methods -- 42%

Insufficient senior staff on the team -- 42%

Poor performance by suppliers of software/hardware (contractors) -- 42%

http://members.cox.net/johnsuzuki/softfail.htm

The cost of IT failures

2006 - $1 Trillion dollars spent on IT hardware, software, and services worldwide...

18% of all IT projects will be abandoned before delivery (18% of $1 trillion = $180 billion?)

53% will be delivered late or have cost overruns

1995 - Standish estimated the U.S. spent $81 billion for cancelled software projects.....

Conclusions

IT projects are more likely to be unsuccessful than successful

Only 1 in 5 software projects bring full satisfaction (succeed)

The larger the project, the more likely the failure

http://www.it-cortex.com/Stat_Failure_Rate.htm#The%20Robbins-Gioia%20Survey%20(2001)

Software as engineering

Software has been viewed more as “art” than engineering

has lead to lack of structured methods and organization for building software systems

Why is a software development methodology important?

programmers are expensive

many software system failures can be traced to poor software development

requirements gathering is incomplete or not well organized

requirements are not communicated effectively to the software programmers

inadequate testing (because testers don’t understand the requirements)

Software Development Lifecycle

Domain Analysis

Software Analysis

Requirements Analysis

Specification Development

Programming (software coding)

Testing

Deployment

Documentation

Training and Support

Maintenance

Software Facts and Figures

Maintenance consumes 40-80% of software costs during the lifetime of a software system -- the most important part of the lifecycle

Error correction accounts for 17% of software maintenance costs

Enhancement is responsible for 60% of software maintenance costs -- most of the cost is adding new capability to old software, NOT ‘fixing’ it.

Relative time spent on phases of the lifecycle

Development -- defining requirements (15%), design (20%), programming (20%), testing and error removal (40%), documentation (5%)

Maintenance -- defining the change (15%), documentation review (5%), tracing logic (25%), implementing the change (20%), testing (30%), updating documentation (5%)RL Glass. Facts and Fallacies of Software Engineering.

Software development models

Waterfall model

specification --> development --> testing --> deployment

Although many use this still, it is flawed and at the root of much of the waste in software development today

Evolutionary development -- interleaves activities of specification, development, and validation (testing)

Evolutionary development

Exploratory Development

work with customer/users to explore their requirement and deliver a final system. The development starts with the parts of the system that are understood. New features are added in an evolutionary fashion.

Throw-away prototyping

create a prototype (not formal system), which allows for understanding of the customer/users requirements. Then one builds “the real thing”

Sommerville, Software Engineering, 2004

Spiral Model

Spiral Model - process that goes through all steps of the software development lifecycle repeatedly, with each cycle ending up with a prototype for the user to see -- it is just for getting the requirements “right”, the prototypes are discarded after each iteration

Challenges with Evolutionary Development

The process is not visible to management -- managers often need regular deliverables to measure progress.

causes a disconnect as managers want “evidence of progress”, yet the evolutionary process is fast and dynamic making ‘deliverables’ not cost-effective to produce (they change often)

System can have poor structure

Continual change can create poor system structure

Incorporating changes becomes more and more difficultSommerville, Software Engineering, 2004

Agile software development

Refers to a group of software development methods that promote iterative development, open collaboration, and adaptable processes

Key characteristics

minimize risk by developing software in multiple repetitions (timeboxes), iterations last 2-4 weeks

Each iteration passes through a full software development lifecycle - planning, requirements gathering, design, writing unit tests, then coding until the unit tests pass, acceptance testing by end-users

Emphasizes face-to-face communication over written communication

Agile software methods

Crystal Clear

Extreme Programming

Adaptive Software Development

Feature Driven Development

Test Driven Development

Dynamic Systems Development

A type of Agile methodology

Composed of “sprints” that run anywhere from 15-30 days during which the team creates an increment of potentially shippable software.

The features that go into that ‘sprint’ version come from a “product backlog”, a set of prioritized high level requirements of work to be done

During a ‘backlog meeting’, the product owner tells the team of the items in the backlog they want completed.

The team decides how much can be completed in the next sprint

*requirements are frozen for a sprint” -- no wandering or scope shifting...

http://en.wikipedia.org/wiki/Scrum_(development)

Scrum and useable software...

A key feature of Scrum is the idea that one creates useable software with each iteration

It forces the team to architect “the real thing” from the start -- not a “prototype” that is only developed for demonstration purposes

For example, a system would start by using the planned architecture (web based application using java 2 enterprise architecture, oracle database, etc...)

It helps to uncover many potential problems with the architecture, particularly one that requires a number of integrated components (drivers that don’t work, connections between machines, software compatibility with the operating system, digital certificate compatibility or usability, etc...)

It allows users and management to actually use the software as it is being built.... invaluable!

Scrum team roles

Pigs and Chickens -- think scrambled eggs and bacon -- the chicken is supportive, but the pig is committed.

Scrum “pigs” are committed the building the software regularly and frequently

Scrum Master -- the one who acts as a project manager and removes impediments to the team delivering the sprint goal. Not the leader of the team, but buffer between team and any chickens or distracting influences.

Product owner -- the person who has commissioned the project/software. Also known as the “sponsor” of the project.

Scrum “chickens” are everyone else

Users, stakeholders (customers, vendors), and other managers

Adaptive project management

Scrum general practices

customers become part of the development team (you have to have interested users...)

Scrum is meant to deliver working software after each sprint, and the user should interact with this software and provide feedback

Transparency in planning and development -- everyone should know who is accountable for what and by when

Stakeholder meetings to monitor progress

No problems are swept under the carpet -- nobody is penalized for uncovering a problem

http://en.wikipedia.org/wiki/Scrum_(development)

Typical Scrum Artifacts

Spring Burn Down Chart

a chart showing the features for that sprint and the daily progress in completing these

Product Backlog

a list of the high level requirements (in plain ‘user speak’)

Sprint Backlog

A list of tasks to be completed during the sprint

Agile methods and systems

Agile works well for small to medium sized projects (around 50,000 - 100,000 lines of source code)

Difficult to implement in large, complex system development with hundreds of developers in multiple teams

Requires each team be given “chunks of work” that they can develop

Integration is key -- need to use standard components and standards for coding, interconnecting, data modeling so each team does not create their own naming conventions and interfaces to their components.

Quality assurance

The MOST IMPORTANT ASPECT of software development

Quality Assurance does not start with “testing”

Quality Assurance starts at the requirements gathering stage

“software faults” -- when the software does not perform as the user intended

requirements are good/accurate, but the programming causes a crash or other abnormal state that is unexpected

requirements were wrong, programming was correct -- still a bug from the user’s perspective

Some facts about bugs

Bugs in the form of poor requirements gathering or poor communication with programmers is by far the most expense in a software development effort

Bugs caught at the requirements or design stage are cheap

Bugs caught in the testing phase are expensive to fix

Bugs not caught are VERY EXPENSIVE in many ways

loss of customers/user trust

need to “fix” it quick -- lends itself to yet more problems because everyone is panicking to get it fixed asap.

Software testing

System Testing

“black box” testing

“white box” testing

Regression Testing

Black box testing

Treats software as a black-box without knowledge of its interior workings

It focuses simply on testing the functionality according to the requirements

Tester inputs data, and sees the output from the process

White box testing

Tester has knowledge of the internal data structures and algorithms

Types of white box testing

Code Coverage - The tester creates tests to cause all statements in the program to be executed at least once

Mutation Testing - software code is created that modifies the software slightly to emulate typical user mistakes (using the wrong operator or variable name). Meant to test whether code is ever used.

Fault injection - Introduce faults in the system on purpose to test error handling. Makes sure the error occurs as expected and the system handles the error rather than crashing or causing an incorrect state or response.

Static testing - primarily syntax checking and manual reading of the code to check errors (code inspections, walkthroughs, code reviews)

Test Plan

Outlines the ways in which tests will be developed, the naming and classification for the various failed tests (critical, show stopper, minor, etc..)

Outlines the features to be tested, the approach to be used, suspension criteria (the conditions under which a test fails)

Describes the environment -- the test environment, including hardware, networking, databases, software, operating system, etc..

Schedule -- lays out a schedule for the testing

Acceptance criteria - an objective quality standard that the software must meet in order to be considered ready for release (minimum defect count and severity levels, minimum test coverage, etc...)

Roles and responsibilities -- who does what in the testing process

Test cases

A description of a specific ‘test’ or interaction to test a single behavior or function in the software

Similar to ‘use cases’ as they outline a scenario of interaction -- however, one can have many tests for a single use case

Example -- login is a use case; need a test for successful login, one for unsuccessful login, one to test the expiration, lockout, how many tries before lockout, etc..

Components of a test case

Name and number for the test case

The requirement(s) or feature(s) the test case is exercising

Preconditions -- what must be set in place for the test to take place

example, to test whether one can register a death certificate, one must have a death certificate filled out and which has passed validations and has been submitted to the local registrar...

Steps -- list of steps describing how to perform the test (log in, select patient A, select medication list, pick Amoxicillin, click ‘submit to pharmacy’, etc..)

Expected results - describe the expected results up front so the tester knows whether it failed or passed.

Regression testing

designed to find ‘software regressions’ -- when previously working functionality is now not working because of changes made in other parts of the system

As software is versioned, this is the most common type of bug or “fault”

The list of ‘regression tests’ grows

a test for the functions in all previous versions

a test for any previously found bugs -- create a test to test that scenario

Manual vs. Automated

mostly done manually, but can be automated -- we have automated 500 tests

Risk is good.... huh?

There is no worthwhile project that has no risk -- risk is part of the game

Those that run away from risk and focus on what they know never advance the standard and leave the field open to their competitors

Example: Merryl Lynch ignored online trading at first, allowing other brokerage firms to create a new market - eTrade, Fidelity, Schwab. Merril Lynch eventually entered 10 years later.

Staying still (avoiding risk) means you are moving backwards

Bob Charrette’s Risk Escalator -- everyone is on an escalator and it is moving against you, you have to walk to stay put, run to get ahead. If you stop, you start moving backwards

DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.

But don’t be blind to risk

Sometimes those who are big risk takers have a tendency to emphasize positive thinking by ignoring the consequences of the risk they are taking

If there are things that could go wrong, don’t be blind to them -- they exist and you need to recognize them.

If you don’t think of it, you could be blind-sided by it

Examples of risks

BCT.org -- a dependency on externally built and maintained software (caMATCH)

BCT.org -- a need to have a hard “launch” date

eCareNet -- a dependency on complex software only understood by a small group of “gurus” (Tolven system)

TRANSCEND -- integration of system components that have never been integrated before (this is common -- first time integration).

TRANSCEND -- clinical input to CRF process has never been done before.

TRANSCEND -- involves multiple sites not under our control, user input will be difficult to obtain because everyone is busy, training will be difficult because everyone is busy, there are likely detractors already and we have not voice in their venue

“Risk management often gives you more reality than you want.” -- Mike Evans, Senior VP, ASC Corporation

Managing risks

What is a risk? -- “a possible future event that will lead to an undesirable outcome”

Not all risks are the same

they have different probabilities that they will happen

They have different consequences -- high impact, low impact

Some may or may not have alternative actions to avoid or mitigate the risk if it comes to pass -- “is there a feasible plan B”

“Problem” -- a risk is a problem that is yet to occur, a problem is a risk that has occurrred

“Risk transition” -- when a risk becomes a problem, thus it is said the risk ‘materialized’

“Transition indicator” -- things that suggest the risk may transition to a problem. Example -- Russia masses troops on the Georgian border...DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects.

Managing risks

Mitigation - steps you take before the transition or after to make corrections (if possible) or to minimize the impact of the now “problem”.

Steps in risk management

risk discovery

exposure analysis (impact analysis)

contingency planning -- creating planB, planC, etc.. as options to engage if the risk materializes

mitigation -- steps taken before transition to make contingency actions possible

transition monitoring -- tracking of managed risks, looking for transitions and materializations (risk management meetings).

Common software project risks

Schedule flaw - almost always due to neglecting work or minimizing work that is necessary

Scope creep (requirements inflation) or scope shifting (because of market conditions or changes in business requirements) -- inevitable -- don’t believe you can keep scope ‘frozen’ for very long

recognize it, create a mitigation strategy, recognize transition, and create a contingency

for example, if requirements need to be added or changed, need to make sure ‘management’ is aware of the consequences and adjustments are made in capacity, expectation, timeline, budget.

It is not bad to change scope -- it is bad to change scope and believe nothing else needs to change

“Post mortem” evaluations

No project is “100% successful” -- they all have problems, some have less than others, some have fatal problems.

It is critical to evaluate projects after they are completed to characterize common risks/problems and establishing methods of mitigation before the next project

Capability Maturity Model (CMM)

A measure of the ‘maturity’ of an organization in how they approach projects

Originally developed as a tool for assessing the ability of government contractors processes to perform a contracted software project (can they do it?)

Maturity Levels -- 1-5. Level 5 is where a process is optimized by continuous process improvement

CMM in detail

Level 1 - Ad hoc: -- processes are undocumented and in a state of dynamic change, everything is ‘ad hoc’

Level 2 - Repeatable: -- some processes are repeatable with possibly consistent reults

Level 3 - Defined: -- set of defined and documented standard processes subject to improvement over time

Level 4 - Managed: --using process metrics to control the process. Management can iddntify ways to adjust and adapt the process

Level 5 - Optimized: -- process improvement objectives are established (post mortem evaluation...), and process improvements are developed to address common causes of process variation.

Why medical software is hard...

Courtesy Dr. Andy Coren, Health Information Technology: A Clinician’s View. 2008

Healthcare IT failures

Hard to discover -- nobody airs dirty laundry

West Virginia -- system has to be removed a week after implementation

Mt Sinai -- 6 weeks after implementation, system is “rolled back” due to staff complaints

2008 - building software: an artful science [ppt]

product backlog

scope shifting

software projects

senior management

software engineering

managing risk

operating

change scope

Documents

general software'.ppt

artful snow

artful learning

artful adventures mexico

the artful nude

artful prose leaflet

artful nr4

artful dodgers

artful eggs.pdf

software reliability ppt

artful physics

artful thinking

artful bird preview

artful poetry

artful learning promo

artful activism

the artful dodger: answering the wrong question the …...

atm software ppt

artful 1/2032

artful learning timeline