volume 6 • issue 6 • june 2009 • $8.95 • · keep your dominos in a row a veteran of...

VOLUME 6 • ISSUE 6 • JUNE 2009 • $8.95 • www.stpcollaborative.com

One Last Look At Data Patterns

Real-World ManagementDo’s and Don’tspage 12

Real vs. PerceivedIn the BusinessProcess Realm

ContentsVOLUME 6 • ISSUE 6 • JUNE 2009

Things are not always what they seem.As a manager you might think youknow how the job is getting done, butwhat goes on behind the scenes? Here’show to find out. By Robin Goldsmith

1166 One Last Look At Data Patterns

JUNE 2009 www.stpcollaborative.com • 3

A Publication

Software Test & Performance (ISSN- #1548-3460) is published monthly by Redwood Collaborative Media, 105 Maxess Avenue, Suite 207, Melville, NY, 11747. Periodicals postage paid at Huntington, NY and additional mailing offices. Software Test & Performance isa registered trademark of Redwood Collaborative Media. All contents copyrighted © 2009 Redwood Collaborative Media. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of addressto Software Test & Performance, 105 Maxess Road, Suite 207, Melville, NY 11747. Software Test & Performance Subscribers Services may be reached at [email protected] or by calling 1-847-763-1958.

1100COVER STORYA Build Management Study: Keep Your Dominos In a Row

A veteran of software development and testing takes you on a journeythough his experiences fixing broken build systems. Read how he did it.

By Chris McMahon

Departments

Interviewing for ajob requires prepa-ration regardless ofwhich side of the tableyou’re on. Learn howto prepare your teamto hire the best auto-mation experts thatcome your way.

By Bernie Gauf and Elfriede Dustin

Understand the data in your organi-zation and improve quality assurance.This final installment covers data pat-terns of interaction, human error andthose that trigger measureable results.

By Ross Collard

2244 Pick The

Best PlayersAnd Your Team Wins

2299 The Real And Perceived

Business Processes

4 • EditorialRevenue models have shifted. So who’s pay-ing for your project?

6 • ContributorsGet to know this month’s experts and thebest practices they preach.

7 • Out of the BoxNew products for testers.

10 • ST&PediaWhy it makes sense to Deliver. Running.Tested. Features.

By Matt Heusser and Chris McMahon

33 • Best PracticesShining a bright light on Eclipse-basedtesting. By Joel Shore

34 • Future TestThe failure of software testing: Will it changethe future? By Geoff Thompson

VOLUME 6 • ISSUE 6 • JUNE 2009

President Andrew Muns

Chairman Ron Muns

105 Maxess Road, Suite 207Melville, NY 11747+1-631-393-6051

fax +1-631-393-6057www.stpcollaborative.com

Cover Photograph by Vika Valter

Editor

Edward J. Correia

[email protected]

Contributing Editors

Joel Shore

Matt Heusser

Chris McMahon

Art Director

LuAnn T. Palazzo

[email protected]

Publisher

Andrew Muns

[email protected]

Associate Publisher

David Karp

[email protected]

Director of Operations

Kristin Muns

[email protected]

Cheif Marketing Officer

Jennifer McClure

[email protected]

Marketing Coordinator

Teresa Cantwell

[email protected]

Reprints

Lisa Abelson

[email protected]

(516) 379-7097

Subscriptions/Customer Service

[email protected]

847-763-1958

Circulation and List Services

Lisa Fiske

[email protected]

When I joined my first mediacompany in 1990, I was one offour technical support peopleservicing about 800 employees.By the time I left the company 10years later, the InformationTechnology department encom-passed more than 100 peopledoing support, analysis, devel-opment, testing and network anddata center operations.

Through all that growth andchange, one thing stayed the same: ITremained a cost center, and the cost of devel-opment projects was shouldered mainly by IT.There were charge-backs, of course, but theywere dwarfed by the overall ITbudget. In companies acrossthe U.S., that might be chang-ing as IT-related projects areincreasingly funded by the business units that receive thebenefits.

According to software devel-opment and system integrationconsultant Thomas O’Mara, therevenue model has undergonea significant shift. “The lastthree companies I’ve dealt with,the funding for their projectshas come from the businessunits, not from IT,” saidO’Mara, whose company, ofTEOInnovations, has engagedin projects as diverse as a gyro-scopic data acquisition projectand a 900+ user Web-basedCRM system.

How does this shift effectsoftware testing? “I have cometo coin the phrase ‘Your appli-cation is only as good as your testing,’ ” said O’Mara, aproponent of equal time andfunding for development andtesting.

He said the shift has not had a tremendousimpact on his engagements. “At the outset ofa project, I always stress the importance of

equal time and resource alloca-tion of developers-to-testers, andemphasize the importance ofallowing adequate time for test-ing. Maybe I’ve been lucky, butmy clients have always bought in.”

O’Mara also advocates earlyinvolvement in projects large andsmall by all stakeholders. In addi-tion to the usual players—devel-opers, testers, business analystsand management—he says it’s

critical also to include end-users in projects,starting at requirements and going all the waythrough to acceptance testing. “Within thelarge enterprise business units, each [stake-

holder] needs to be given anequitable stake in the softwaredevelopment.

Even more important, themanaging entities of the busi-ness units and IT staff need toprovide the infrastructure andguidance, and to manage itthrough to the desired out-comes.”

To motivate stakeholders toparticipate this way, it’s impor-tant for each to feel a sense ofownership in a software devel-opment project.

While it may not be the entiresolution, we’re hopeful that mov-ing IT from a general corporatecost center to being partiallyfunded by business units willhelp build that combined senseof ownership and ultimatelyimprove software quality.

CORRECTIONAdam White, author of the Mayissue’s Future Test column, isdirector of test engineering fordata center products at Novell.We had identified him in his

previous role as manager of data-centerautomation at PlateSpin, a company thatNovell acquired in March. ý

Who’s Paying ForYour IT Projects?

Ed Notes

Edward J. Correia

•Moving IT from

a cost center

to partial funding

by business units

may help build

a combined

sense of

ownership.

•

4 • Software Test & Performance JUNE 2009

In more than a decade as a professional tester, CHRISMCMAHON has amassed experience ranging from tele-com networks to social networking, and from COBOL toRuby. He’s a dedicated agile tester, telecommuter and a for-mer professional musician. He also wrote our lead feature,in which he chronicles some of his experiences workingwith—and fixing—broken build systems. His accounts beginon page 12.

Contributors

TO CONTACT AN AUTHOR, please send e-mail to [email protected].


News

Information

Knowledge

Education

Training

Resources

Solutions

Networking

Community

News

Information

Knowledge

Education

Training

Resources

Solutions

Networking

Community

...and morethan 50,000of your peers

...and morethan 50,000of your peers

The New

Coming Soon!

The New

Coming Soon!

In the final installment of his live data series, software qual-ity consultant ROSS COLLARD describes data patterns asthey pertain to interaction, human error and those capa-ble of triggering measureable results. Understanding thepatterns in your organization is a critical part of qualityassurance, and Ross, founder of Collard & Company,explains how data can be enhanced to best fit specific test-ing goals. Page 16.

ROBIN F. GOLDSMITH is president of Go-Pro Management,where he works directly with and trains professionals in busi-ness engineering, requirements analysis, software acquisition,project management, quality and testing. Author of manybooks on software quality and requirements, Beginning onpage 29, Robin shares some of observations of how a com-pany’s processes might not be what they appear. What peo-ple perceive is not always what is real.

Interviewing might be a science, but hiring is an art.Knowing how to select the right people for your teamcan be every bit as important asknowing how to implement andmanage it. This is particularly trueof automation teams. BERNIEG A U F, president of InnovativeDefense Technologies, and automa-

tion expert ELFRIEDE DUSTIN, details what it takesto pick and retain a winning team. Turn to page 23.

Advertiser URL Page

Electric Cloud www.electric-cloud.com 11

Hewlett-Packard www.hp.com/go/alm 36

Instantiations www.instantiations.com/freetrials 9

Openmake www.openmakesoftware.com 15

Software Test & Performance Collaborative www.stpcollaborative.com 6

Software Test & Performance Magazine www.stpcollaborative.com 28

STPCon Fall 2009 www.stpcon.com 5

TechExcel www.techexcel.com 2

Test & QA Report www.stpmag.com/tqa 35

Index to Advertisers

It’s putting team capabilities where itsname is. Collaboration tools makerCollabNet in mid-April began shippingTeamForge 5.2, adding lab managementand cloud provisioning capabilities anda plug-in for the Hudson continuous inte-gration engine. Formerly knows asSourceForge Enterprise, the repositoryand development environment wasacquired by CollabNet from VA Softwarein 2007.

“Rebranding our core product underthe TeamForge name reflects CollabNet’sability to support software developmentby agile, collaborative and distributed proj-ect teams,” said CollabNet president andCEO Bill Portelli, and asserted that therelease is the most innovative in the com-pany’s history. “TeamForge will transformthe way distributed organizations developsoftware.”

The company also reportedly beefedup the system’s role-based access controland increased visibility into the gover-nance, management and control of repos-

itories containing mission-critical software.A redesigned permissions structure addsa path-based access control.

For teams using cloud-based resources(and who isn’t these days?), TeamForge5.2 can now directly access Amazon’s ElasticCompute Cloud servers as well asCollabNet’s OnDemand Cloud or yourown private system. New lab management

and development services permit test teams“to define and modify profiles and softwarestacks and provision these profiles on bothphysical and virtual build-and-test serversfrom any public or private cloud.”

The new system also now includescontinuous integration capabilities a laHudson, “enabling Hudson users to pro-vision and access build-and-test serversfrom any of these clouds.” The newHudson plug-in is open-source and per-mits services for authentication, per-missions and automatic uploading ofbuild results.

CollabNet’s TeamForgeMore Than a Name Change

Out of the Box


Automated GUI tool maker Bredex hasadded model-based testing to GUI Dancer,its flagship automation tool for testinggraphical user interfaces. New in version3.0, the capability gives testers a choice ofgenerating test cases from UML models orwhile running the application under test.

Testers familiar with GUI Dancer willappreciate the introduction of a new mod-eling perspective, which permits test casecreation using the same UML diagramsin use by developers. This helps ensuregood testing structure and coverage ofimportant use cases. An observation modecreates test cases from within the AUT whileproducing no code. Available now, GUIDancer 3.0 also now ships with unlimitedexecution licenses with every specificationlicense.

GUI Dancer 3.0 Adds Model-Based Testing

GUI Dancer 3.0 adds modeling support whilestaying true to its keyword-driven roots.

A peek inside a tracker screen of TeamForge 5.2, which is apparently used in Hollywood.

Addressing what it claims is an unful-filled need for end-to-end testing, Parasoftin late April updated SOA Quality Solution,

its security, reliability and business-processcompliance platform. The tool now sup-ports tests from Web interface through to

“backend services, enterprise service bus-es, databases and everything in between,”according to the company.

“In most cases, organizations do nothave the capability to create and executeend-to-end tests,” says Wayne Ariola, vicepresident of strategy at Parasoft. He addsthat companies do manage to extendedtest scenarios find themselves with teststhat are brittle and have a limited lifespan. The release, he says, allows organ-izations to easily create, execute and main-tain tests that can be “initiated at a richuser interface, through the logic in themessage payer, through the implemen-tation component to the database ormainframe and back, validating the entirebusiness process.”

SOA Quality Solution also provides vis-ibility into the enterprise service bus, andcan trace events triggered by tests on run-ning applications. This allows teams tomonitor execution and generate tests auto-matically from events taking place in realtime. Out of the box, the platform is awareof ESBs from IBM, Oracle, Progress,Software AG, TIBCO and others.

SOA Quality Center displays results of end-to-end performance tests graphically during execution.

Everything’s going virtual these days. Thelatest company to jump on the virtualbandwagon is embedded test-tool mak-er Fanfare. The company in late Mayreleased iTest 3.4, which it claims includesthe industry’s first virtualized test envi-ronment, a feature that enables teams tobuild device test cases before the actualdevice exists.

The aptly named Virtual Testbedenables the creation of real-device sim-ulations, permitting developers andtesters to build and test their applicationsbefore they have access to target hard-ware, according to the company. Testassets can be saved and transferredamong testers or teams.

Also new in the company’s flagshiptest tool is the ability to group their testsinto suites and schedule them for auto-

matic execution andrecurrence. Version3.4 improves layer 2-3 and 4-7 testing,the company said,and now supportsIxia’s IxNetworkand Spirent Avalan-che network simula-tion and test fix-tures.

If you’ve pur-chased the optionalUS$1,000-per-yearintegration withHewlett -PackardQuality Center, you can now also batch-publish tests and reports from iTest toQuality Center.

Free for current maintenance cus-

tomers and subscription licensees, iTest3.4 began shipping on May 27. Pricingstarts at $1,700 for an annual subscrip-tion.

A Virtual TestbedSprings from iTest 3.4

Out of the Box


SOA Quality Solution Now Goes End-to-End

During test execution, results from iTest’s Virtual Testbed can be dis-played in real time.

Keane on Keyword-Driven TestingKeane’s Open2Test is an open source framework that is intend-ed to allow testers to “create test scripts using keywords inExcel spreadsheets” and run them with any major test automa-tion tool. Integration with Hewlett-Packard’s QuickTest Prois included.

The free tool, which can be downloaded at www.open2test.org,“allows testers to develop test cases using Microsoft Excel anda list of keywords,” according to a document on the Web site.“When the test is executed, the framework processes the Excelworkbook and calls functions associated with the keywordsentered in the Excel spreadsheet. These keyword functions inturn perform specific actions against the application under test(AUT). The framework interprets the keyword and executes aset of statements in the program.” Using the framework, testingof applications can be automated without starting from scratch.“Testers simply use the application independent keywords andcreate extra application-specific keywords.”

The framework also allows use of variables, conditional check-ing, data-driven testing, function calling and reusable actions,keyboard input, data and time functions, as well as object andexception handling. Open2Test 1.0 is available now.

Fortify 360 Expands GovernanceSecurity tools maker Fortify has released Fortify 360 2.0, anupdate to its security lifecycle tool that it says now includes theability to apply governance rules throughout an organization.

The Fortify 360 suite includes static and dynamic analysis ofrunning applications, visualization of defects with priority settingcapabilities, real-time patch delivery for rapid response to urgentvulnerabilities, shared workspaces for security and developmentteams, and governance center with central security dashboardand control panel for monitoring, reporting and tracking trends.

Expanded governance features include the ability to createand manage an inventory of enterprise software and to assignrisk to each based on factors such as whether they are Web-fac-ing or outsourced. Security policies can be automatically gen-erated and applied based on an application’s risk profile.Processes are communicated and tracked centrally, and can beaccessed by developers and security testers.

Groups, Mono Move to TeamCity With the release of TeamCity 4.5, JetBrains’ distributed buildmanagement and continuous integration tool now has the abil-ity to create user groups with roles and notifications. Also newin the tool is extended LDAP support with automatic synchro-nization of user profiles and auto-detection and support for theMono open source .NET platform for continuous builds.

TeamCity 4.5 is also now able to parse raw XML reports fromANT’s JUnit tasks, as well as FindBugs, NUnit, PMD and Surefire,and supports grouping of tests, problematic tests, assignment ofbroken builds a change log and other UI improvements. CVS isnow supported for Eclipse remote run and pre-tested commit.TeamCity 4.5 is available now in free and commercial versions.

Performance Makes a House CallShort of staff for allocation to load testing? Now Keynote Systems,which offers on-demand load and experience testing tools andservices, has unveiled LoadPro 2.0, a version of its Web site loadtesting software “delivered as a turnkey managed consultingservice.” The new software also includes an end-user portal withexpanded dashboard, real-time visibility into tests as they exe-cute and account management.

Coverity Preempts Security FlawsIf you’re using Coverity’s Integrity Center to analyze the securi-ty of your software code and architecture, you must surely knowthat you also can use it to check your builds. The company addedBuild Analysis to its system the same day the system itself wasannounced. The feature gives teams the ability to “identify thesource of defects that occur due to improper or accidental inclu-sion of the wrong object file,” according to the company. Themodule also can “halt the introduction of malicious or unin-tentional vulnerabilities through software components or opensource packages that may contain known security problems.”Coverity claims that it can stop compliance violations due to alack of visibility into the build process.

Send product announcements to [email protected]


Our title “Deliver. Running.Tested. Features.” is also amotto of the Agile commu-nity. But being able to deliv-er running tested featuresmakes sense regardless ofyour software developmentprocess.

Since the theme for thismonth's ST&P is “BuildManagement,” we look at how build-and-deploy systems enable software develop-ment teams to deliver running tested fea-tures.

Deliver. Running.Tested. Features.TARGET ENVIRONMENT An excellent build/deploy system shouldbe capable of populating environmentsfor developers; for testing; and of coursefor production.

DEVELOPMENT ENVIRONMENT Development environments are the soft-ware equivalent of the Wild Wild West.Developers need the freedom to exper-iment and to try solutions, so dev envi-ronments tend to be populated as a 'pull'by the developer on the build/deploy sys-tem. It should be an easy thing to addfinished code to the build/deploy systemwhile keeping unfinished code from pol-luting other environments.

TEST ENVIRONMENT Test environments vary quite a lot,depending on project requirements.Chris once worked as a tester for a main-frame system where the test environmentwas maintained very much like the pro-duction environment. Today we bothwork testing a web app, and our test envi-ronments and dev environments areessentially identical.

Matt adds: Can you testers spool uptest environments quickly? If not, beingable to do so would be a quick-winprocess improvement.

PRODUCTION The most important environment. Whenyou improve your build/deploy system,

always make deploying toProduction your top prior-ity - ideally, it's a button push(or a very few button push-es) by someone that mat-ters.

Deliver. Running.Tested.Features.

When you are done deployingto your target environment, the codeshould be ready to use. Here are somecommon steps necessary to make that hap-pen. A good build/deploy system will havemost if not all of this process automated.

SOURCE CODE Build/deploy systems start by checkingout a particular set of source code.

COMPILE Unless your code base is written in aninterpreted language like Perl or Ruby,your build/deploy system will invoke acompiler. Make sure to automate thechecking of compiler logs for errors.Deploying a broken system is no fun foranyone.

CONFIGURE Almost every software system has files andsettings to control aspects of systembehavior. Each of your target environ-ments will likely have its own particularconfiguration. Make sure yourbuild/deploy system can configure thetarget environment correctly.

Deliver. Running.Tested. Features.Before we talk about deploying to a testenvironment, we should emphasize thatwe think running automated tests fre-quently makes a lot of sense.

CONTINUOUS INTEGRATION The practice of running automated unittests upon each code check in, or oth-erwise very often. Many modernbuild/deploy systems will not deployunless the build succeeds and the unittests pass.

Some system- and integration-level tests

take long enough to run that they shouldnot prevent deploying the system. Still,running such tests as often as possible andmaking the results as public as possible isa good practice. For example, as of thiswriting, the system we work in has about17,000 individual assertions at the unitand integration levels and almost 10,000assertions at the UI level. The UI-level teststake about two hours to run, but we runthem 24/7 and publish the results of fail-ing tests in a public place.

TESTWARE An old word in the software developmentarena, testware refers to those parts ofthe system necessary for testing, butwhich are not actual source code. Systemdata, special configuration files and set-tings, test cases both automated and man-ual, code to run tests, tools to run tests,are all testware. We recommend keepingtestware under version control along withthe system being tested. Having the twoin sync prevents a lot of hassle when deal-ing with multiple system versions.

Deliver. Running.Tested. Features.Build tools, test tools, CI and deploy-ment tools only exist to help accomplishthe real goal, which is to deliver work-ing features on a regular basis.Continuously Integrate or don't, auto-mate the build or don't - have an internrun a boring manual script written in1985, we don't care. Ultimately, the keyquestions are:

1. Are you delivering working softwareregularly?

2. What can you do, right now, toensure that that continues?

3. What can you do, right now, tomake it better than it is today?

Tuning up your build/deploy systemsmight just be one of the best investmentsyou can make to deliver. running. tested.features.ý

Deliver. Running. Tested. Features.

ST&PediaTranslating the jargon of testing into plain English

Matt Heusser and Chris McMahon are careersoftware developers, testers and bloggers. Mattalso works at Socialtext, where he performstesting and quality assurance for the company’sWeb-based collaboration software.

Matt Heusser andChris McMahon



Pho

togr

aph

by C

osm

a /

Fot

olia

.com

With Automated Build Management, Any of a ThousandLittle Pieces Can Bring the System Crashing Down

you can, then congratulations to you and your team. It is far more common that even with great source control,

a great continuous integration system and great test environ-ments in place, there are still some manual steps involved indeploying your builds to all their required destinations.

But we all aspire to be able to click a single button andhave our builds deployed where we want them. This articlediscusses some common issues that prevent fully automat-ed deployment, and some ways to approach automatingthose problems.

Unique HardwareI once worked testing a system that required a unique hard-ware setup, so the test lab had some dedicated machinesdevoted to smoke testing and system testing. But in order torun a smoke test, the testers had to spend a lot of time clean-ing up the test systems after every smoke test, because the

By Chris McMahon

C an you press one button and deploy yourlatest build to the test environment? If

application would pollute the system by not uninstallingcleanly. Especially early in the development cycle, the testerscould only run about three smoke tests per week, because themaintenance cost of the test platform was so high.

I solved the smoke test problem using a disk-imaging toolavailable for FreeBSD and Linux called “Frisbee,” whichallowed me to create an image of the clean system. Then,whenever a smoke test failed, I could boot to a Frisbee CDand lay down the clean image from a network server ontothe test machine in a matter of minutes, instead ofspending hours manually cleaning upthe test hardware and removing thebroken bits from the previous testfailure.

At the same time I was doing thiswork with the disk imaging, I wasalso working with an outstandingbuild and configuration-manage-ment person to create robust testinstallers on demand. If I had builtinto the system the ability to bootthe test servers from the networkinstead of from a CD (which isquite possible using Frisbee), wecould have actually had the wholeprocess completely automated.While we did not quite achievetotal end-to-end automation, wewent from being able to run threesmoke tests per week to being ableto run six smoke tests per day.

Of course, this exposed a newproblem: instead of hearingthree times a week that the buildwas broken, management wasnow hearing thirty times perweek that the build was broken. Itforced the team to really addresssome issues that had been hiddenuntil we automated the deployment of ourtest environments.

Note: I worked with Frisbee in an all-Microsoft shop. Ifyou work in a Microsoft shop, consider adding some Linux orFreeBSD machines to your test lab. Besides Frisbee for diskimaging, there are great tools for network testing, securitytesting, and all sorts of other functions available for thoseoperating systems that are either not available on a Microsoftplatform, or are prohibitively expensive.

Test DataRarely does an application manage data simple enough thatdeploying test data is an easy task. In the late nineties Iworked testing a system that did data validation for 911 calls.(As a result of my work, when you dial 911 from your landline because you’re choking on a chicken bone, the dispatch-er knows where you are without you having to speak.)

We operated on a three-month release cycle. (Actually,we invented a development and release process that would

today look very much like an Agile process; the roots of theAgile movement are in mainframe development.) Afterthree months of hard data-validation testing, the test data inthe test environments was a mess: invalid records, brokenrecords, mishandled records, all kindsof errors in the system. Also, our

requirements focus fromrelease to release would vary, and

we would want different kinds of testdata depending on what sort of requirements we

had for each development cycle. So every threemonths, after the release to production, we flushed all thedata on the test mainframes and started over anew.

The production systems contained millions of records. Weneeded a good-sized set of data for test purposes, but theproduction systems contained an unmanageable amount ofdata, so cloning them was not an option. And yet having agood variety of test data available was critical to the work.

We developed a tool such that, at the beginning of eachdevelopment cycle, we would extract a consistent set of40,000 or so records from a production environment andinject them into the test environment. This representedabout one quarter of one percent of the number of pro-duction records. This was enough to give us a good set ofrelevant test data, and also to give us a good idea of anyperformance problems we might encounter, but was smallenough to be manageable.

The hard part was extracting a set of test data that wasboth consistent and representative. We spent a significantamount of time on our extract/load tool to guarantee bothconsistency and representation. The whole extract/loadprocess took about a day, but at the beginning of each three-month cycle, we had good clean test data to work with. Thetime spent developing the tool was well worth the effort,and the time spent extracting the records and loading them


Chris McMahon is a dedicated agile tester with more than a decade ofprofessional testing experience. He writes the ST&Pedia column withMatt Heusser.


into the test systems was a smallenough fraction of the total develop-ment time that that was also worthdoing.

Unit Tests and Integration TestsSome testers I know are making lessand less of a distinction between unittests and integration tests. I distin-guish between them by whether or notthey interact with system data. In myview, unit tests do notinteract with a databaseor file system, but inte-gration tests do.

Instead of test data,unit tests interact withmock objects or stubs,bits of code that emulatethe database or the filesystem, so that the unittests can run as quickly aspossible. Unit tests thattake more than a coupleof minutes to run do nothave much value.

But mocks and stubscan become out of syncwith the live system, so it’sa good practice to alsohave integration tests thatinteract with a real data-base or a real file system,to guard against the riskthat system data is notproperly emulated by themocks and stubs the unittests use.

On one project, I hadto test a new API, but theunderlying data for theAPI was in a very old, highly normal-ized database that was difficult to read.Our developers eventually came upwith a scheme to manage completelyartificial test data for their unit tests. Itook a different tack for integrationtests against the API.

Because the API-level integrationtests could run more slowly than theunit tests and still provide good value, Iused a full copy of the production data-base and designed tests to look formore human or social problems:

• Tests for records valid only in thefuture.

• Tests for queries that confuse mul-tiple record identifiers.

• Tests that generate randomqueries on the source data and val-idate that what comes over the APIis the same as what is in the source

system. (These tests found somebugs with unexpected charactersin the source data, such as “<”, “>”,“ ' ”and “&”)

Generating pseudo-random queriesagainst actual system data for test pur-poses was an inspired approach.Because the underlying data was sopoorly understood, not only by thedevelopers but also by the business itself,and because the API itself was adding

features so quickly, weneeded a test approachthat would not only vali-date our code as we devel-oped it, but also give usinformation about theunderlying system data atthe same time.

I had an ongoing seriesof tests that would issueSQL queries against thedatabase in a somewhatrandom fashion, and thencheck that the samerequest through the APIwould return the sameinformation. For instance,requesting informationwith a first name/lastname combination of“J%” and “SM%” with aSQL query would returnappropriate data forJOHN SMITH and JANESMART, and I could checkthat the API data associat-ed with these people wascorrect. Some of the sys-tem data uncovered bythis test approach was

quite surprising. While we should havebeen prepared to handle “O'Malley”(we weren't), no one on the team or atthe business would have suspected thatthe database would contain “Smith &family.”

Unexpected DataOn one hand, unexpected data cancome about because of mistaken inter-nal processes. Keeping your applicationcode, test code and test data all in synccan be challenging. Running version 5tests against version 6 code with version4 test data often produces errors thattake significant time to unravel.

On the other hand, unexpecteddata can come about because ofunforeseen external conditions. Onesystem I tested would often crash wheninstalled on the customer's system

because of badly-managed local LDAPrecords at the customer's site. Since wehad never anticipated such externaldata conditions, those external recordswould leave these production systemsin an unusual state that would requirehours of manual repair work.

I worked on another system wherewe always allowed a direct upgradepath to the latest version of the soft-ware from any previous version.Although we did excellent testing forall known upgrade paths, on occasionsome unusual condition in a previousversion tucked away on a customer'ssite would cause the upgrade to fail andrender the upgraded system unusable.And again, the remedy involved expen-sive manual repair work.

There are two ways to approach theunexpected data problem. One is tobuild into your deployment the abilityto roll back the system to a previousvalid state. If something goes wrong inthe deployment, a rollback featuregives the system the ability to automati-cally reverse the steps it has alreadytaken or to restore the pre-deploymentstate of the system.

The other approach is to stop thedeployment before the systembecomes corrupted and requiresrepair. This allows a fix of the dataproblem quickly, before the system iscorrupted.

Unfortunately, both solutionsrequire that the deployment recognizethe nature of the bad data, and we canusually only build in such recognitionafter we've seen enough failures toknow what kinds of bad data to expect,and what to do about it when bad datais detected.

Constrained Test Environments Many systems have different configura-tion environments for test, staging andproduction. For example, there mightbe many instances of test environmentson a single host, but production willhave only one instance for the wholesystem. Staging environments mighthave extra diagnostic tools not avail-able in production.

So your build management anddeployment system has to know auto-matically which environment it is oper-ating in, and how to set up the systemsappropriately.

I worked with one deployment sys-tem that deployed various configura-tions according to UID value. If the

DOMINO-PROOF BUILD

•Your build

management

and deployment

system has

to know

automatically

which

environment it is

operating in.

•

deployment was running as root, itknew to do a production deployment.Otherwise, it would do a test deploy-ment and open different ports, accessdifferent configuration files and createdifferent databases.

Testing some aspects of this particu-lar system was challenging because ofthe differences between test and pro-duction environments. Ultimately wesolved that challenge by having fairlysophisticated scripts that would createinstances of virtual machines and loada particular version of the system ontothose machines, complete with all thetest data. This allowed us to test certaincritical aspects of the software in a pro-duction-like system instead of the dif-ferently-configured test environments.

Your Tools Should Have ToolsFor a one-button automated build anddeploy system, the idea is to have all ofthe following abilities working welland running as quickly as possible.

This article has shown only a few ofthe many, many little things that cango wrong in the course of making allthese bits work. It is quite likely thatyour team has some bits working welland other bits not so well. It is impor-

tant to note that it is far better to haveeach of these functions working wellseparately before trying to incorpo-rate several of them into one largeend-to-end process.

Automating your build manage-ment and deployment can't really bedone in a big-bang, all-at-once way. Youneed a tool to extract your source codecorrectly; another to build or compileit; more tools to deploy it correctly;another tool to run unit tests; anotherto run system tests; another to createvirtual machines with a particular ver-sion of the software.

But once you have such things, alot of stuff becomes possible that wasnot possible before. I wrote a con-troller script not long ago that wouldmarshal all of the tools we had inplace in order to: check out the latestsystem code with one tool; build a testsystem with several other tools; loadup test data for a set of 10,000 indi-vidual assertions about the applica-tion using a few more tools; checkeach test case for failure, and uponfailure re-run the test (sometimesthese particular tests reported falsefailures because of system timeouts);report each failure to a central system

using an API tool; then do the wholething over again. This script reportedsystem-level test results on the latestbuild of the code about every 2 hours,24 hours a day, 7 days a week.

But that was only running in a singletest environment on a single host. Thenext step was to distribute all of thiswork across multiple hosts, whetherthose hosts were real hardware or virtu-al machines. There are already newcompanies springing up that useAmazon's Elastic Compute Cloud(EC2) platform to run massive num-bers of tests in parallel, to run real-timeperformance tests using hundreds orthousands of browsers, and any numberof other exciting things that becomepossible when you have a whole lot ofvery inexpensive hardware available.

So to take the example of my con-troller script just a little farther: what Ireally wanted to do was to marshal theset of tools available to me to take theset of test cases, assign each test case toone of several queues, then to run eachqueue simultaneously on a number ofvirtual machines. I wish I'd gotten achance to build that system.

Maybe you could build it usingsome of the ideas in this article. ý

DOMINO-PROOF BUILD


your performance testing through thesmarter use of live data. This is the sec-ond in the series specifically about testdata patterns, and it categorizes andoutlines patterns used by performancetesters to enhance their data to fit spe-cific testing goals.

Performance testers with intermedi-ate or advanced skills are the intendedaudience for this article, but no specifictechnical knowledge is assumed orrequired. The content should also beuseful for functional testers and fornon-testers who manage performancetest projects.

An Introduction to PatternsThe word “pattern” is defined here as apractice, model or blueprint. While rep-etition is central to the concept of pat-terns, so are learning and improving onthem, In this article I describe morethan 80 test data patterns, most of whichare massaged or enhanced for a particu-lar test purpose.

Here I describe them as many indi-vidual patterns so that the purpose andmechanics of each is clearer. Managingand using more than a few data patternsis unwieldy, so testers generally consoli-date them into a single one or a smallcollection.

PART I:Interaction PatternsRendezvous PatternThis is a type of spike testing where at

least two—and more likely many—events rendezvous. For example, theyhappen simultaneously, or within a smallenough time interval that they are notindependent, and they could impact orinfluence each other, for example, bycompeting for the same resources.

A rendezvous pattern might bewhen we use a software tool to simu-late a hundred users hitting the enterkeys on a hundred keyboards concur-rently, at almost exactly the same time.Rendezvous tests are often unrealistic,because a hundred users are unlikelyto hit their enter keys simultaneously.If we allow a small spread of eventsacross time, instead of requiring aninstantaneous happening, then therendezvous test becomes much morerealistic. For example, we might createa scenario to simulate what would hap-pen if a hundred users all hit theirenter keys across the span of two sec-onds.

Interference PatternThis type of testing attempts to stress asystem by having features, processes orthreads interfere with each other.

Suppose that a system contains twofeatures, A and B. In the feature test-ing, we run a set of test cases to exer-cise feature A, and another set for fea-ture B. These features can interactand possibly interfere each other, forexample, by both being able to simul-taneously access and update the samerecords in a common database.

In this situation, the feature testingusually includes feature interactiontesting, but only to a limited degreeand for the simplest cases. For exam-ple, in a manual test two differenttesters, working from two worksta-tions, attempt to update the same

By Ross Collard

T his is the concluding installment of a series of articles onusing live data intesting, the objective of which is to improve


Ross Collard is founder of Collard &Company, a Manhattan-based consultingfirm that specializes in software quality.

Patterns Exist

Everywhere In

Nature. Seeing

How They Effect

Our Data Can Be

The Key To

Unlocking Untold

Efficiencies In

Your Testing.

database record at the same time. Usually the feature test team cannot

easily try more complex combinationsof many concurrent activities. By con-trast, a load or stress test by its natureusually incorporates multiple concur-rent demands on the system we are test-ing. In the feature interaction variationof a load test, we deliberately engineerthe test workload to include complicat-ed and interacting mixes of demands.

Interoperability, Interface PatternErrors can occur because of mismatchesat interfaces. The U.S. lost a spacecraftat a cost approaching $200 millionbecause of miscommunication. Onesoftware subsystem on the spacecraftassumed that numbers passed betweensubsystems were in meters, while anoth-er assumed feet.

Interoperability testing addressesthe joint behavior of multiple differ-ent components and systems whichinteract, usually in complex ways.These may use different technologiesand were probably built at differenttimes by different people. In addition,the details of the internal workings ofthe individual systems may not beavailable to the testers. Both interop-erability and interface testing alsofocus on conformance with standardprotocols.

Deadlock PatternThis type of testing attempts to stress asystem by locking a database, eitherdirectly or through transactions whichinterfere with each other. The testersidentify situations where deadlockmight occur and design the test work-load to try to trigger a deadlock.

Deadlocking is a situation in which aset of interdependent tasks is blocked,

with each one waiting for an action byanother one of interdependent tasks.

For example, let’s say program 1requests resource A and receives it.Program 2 requests resource B andreceives it. Program 1 requests resourceB and is queued up, pending therelease of B. Program 2 requestsresource A and is queued up, pendingthe release of A. Now neither programcan proceed until the other releases aresource. Deadlock often arises fromadding synchronization mechanisms toavoid race conditions.

Most databases are built to supportseveral concurrent users. This meansthat there is a risk of one user updatinga piece of data while another user is try-ing to read or update the same data. Toavoid this problem, stored procedurescan be built to include locks. A lock tem-porarily denies access to other users, fora short duration, while one user is read-ing or updating the data. Careless use oflocks also can lead deadlock.

Synchronization PatternThis type of testing attempts to stress asystem by causing timing problems andout-of-synch process. These are alsocalled race conditions.

Systems often have inadvertent andunrecognized assumptions built intothem, about the expected sequence ofevents of the expected timing ofevents. Let’s say that the systemassumes if event A always precedesevent B. What happens if this assump-tion is not met, if an event happenslater or earlier than anticipated? Doesthe system timeout (and is not sup-posed to) because of the late event?Does the early event go unnoticed?The aim of synchronization test casesis to answer these questions.


PART I:Interaction PatternsThis is test data in which two ormore events are likely to intersect,such as when a number of user logon to a system at the same time.Such events are important tounderstand and to plan and test forbecause they are part of every dayoperation and application usage.

Page 16

PART II:Human Error PatternsHumans are fragile creatures; theyare imprefect and often wrong.Therefore the tester must takehuman errors into account whendeveloping and executing testcases. After all, everyone's entitledto a bad day. Except the tester.

Page 18

PART III:Patterns ThatTrigger MeasurableBehavior System response time, total time tocomplete a transaction, length oftime from login to first activity.These events are all measureable,but first they must be defined.

Page 19


PART II:Human Error Patterns

"Bad Day" PatternA user scenario test case is one thatemploys a real-world set of activitiesbased on how the users actually usethe system. Feature testers developconventional test cases using standardtest case design techniques such asboundary value, and derive the testcases for one feature at a time.

User scenario test cases generallystress the system better than simpleone-feature test cases, but tend tounder-represent complex interactionsamong transactions. The user scenar-ios are often more messy,unstructured and demandingthan feature test cases devel-oped according to conven-tional test case design tech-niques

Bad day testing is based onthe premise that we all can hitthe wrong button or have abad day. If someone doespush the wrong button, wewant to ensure that we don’tsuffer any horrendous conse-quences. Bad day testing issimilar to usability testing andoperator error testing.

Soap Opera PatternA soap opera test is a type ofuser scenario test that exaggerates theday-by-day actions of users. (Just think ofhow many crises and storms-in-teacupsare packed into a half-hour “soap.”)

There are two objectives in a soapopera test:

• Increase the rate at which bugsare found by focusing on Alarger-than-life situations.

• Stress the system with tougher-than-average user scenarios.

The intention is to exaggeratedeliberately, but not to go to unrealis-tic extremes. For example, it might beappropriate to bang away franticallyon the keyboard for a few minutes butnot to draw a gun and blast the screen,unless your users are in unusually highstress situations.

Disaster Recovery PatternThis type of testing uses the disasterscenarios which were identified in theorganization’s disaster recovery plansas a source of test cases.

I will illustrate this point with anexample of a system failure. In whatwas considered a major crisis, theNasdaq stock market halted trading ona busy Friday in 2001. An employee ofWorldCom, which provides communi-cations services to the stock exchange,had inadvertently forced Nasdaq'scommunications network to shutdown. (WorldCom later said that test-ing a new system being developed forNasdaq had caused the service inter-

ruption. With hindsight, a busy Fridaywas not a very smart time to run thesystem test.)

Nasdaq was able to restore servicefairly quickly, but a secondary problemblocked its stockbroker clients fromusing the system for several morehours. The outage had disconnectedall of Nasdaq's users from their net-work. When these users attempted tolog back in after the network adminis-trators had resolved the problem–andall at approximately the same time–thesystem’s log-in process was unable tocope with the huge surge of demand.

Environmental PatternThe term "environmental testing"originally came from hardware engi-

neering, where it is defined as testingfor physical factors such as the loss ofpower, vibration, G (gravity) forces, airpollutants in factories, electric shockand other hazards, electromagneticradiation, extremes of temperature,humidity and so on.

In software, an environmental test isone that concentrates on the impact ofphysical hazards or physical failures onthe operation of the software. Forexample, at altitudes above 50,000 feet,cosmic radiation can arbitrarily changethe values of data (the radiation“writes” to the magnetic media whichstores the data). The U.S. Air Forcetests high-altitude software in labs with

high radiation exposure.

Live Change PatternMany systems must keep run-ning no matter what. Oneexample is that of an aircraftflying over the ocean. Whathappens when an emergencyfix or routine maintenancemust be done on such systems?

This type of testing assessesthe ability to make live modifi-cations to the system withoutinterrupting service.

Always-on 24x7 and 24x365(23x366 in leap years) systemsneed to be maintained literallyon the fly. If there is no place toland, by way of analogy, we can-

not land an airplane, make the changeon the ground with the system inac-tive, and then return to flight.

Examples of live maintenanceinclude adding new devices to a net-work, changing the way systems arepartitioned and resource capacity isdedicated to applications and backingup a database.

Since the live modifications createstresses which otherwise we may notencounter, and since preserving busi-ness continuity is critical in always-onoperations, we need to try theseadjustments as part of the test project.

System Change ImpactAssessment PatternAssesses the impact of a change or a

•In software, an environmental test

is one that concentrates on the impact of

physical hazards or physical failures

on the operation of the software.

•


group of changes to an existingsystem.

Infrastructure ImpactAssessment PatternAssesses the impact of a changeon an existing infrastructurewhich is supporting a mix ofwork load demands. This testingfocuses not so much on theimmediate application beingchanged, or the new applicationbeing introduced into the envi-ronment, or a change in thedemand patterns within oneapplication, but on its side effectson the other uses of the infra-structure. The issues are thecapacity and the utilization ofresources in that infrastructure.This is also called an environ-mental assessment.

Error Detection & Recovery PatternMany software developers areblessed with eternal optimism.But have you ever seen a softwareproduct that you couldn't makefail if you wanted to?

Error handling requires pre-vention and detection controls,dependable back-up systems anddependable recovery systems.Error recovery testing is intend-ed to ensure that the system'scontrols and manual and auto-mated back-up and recoverymechanisms work as expected.

The ANSI/IEEE standardsdefine recovery as the return ofa system to a reliable operatingstate after failure. Systems havewritten and unwritten recoveryobjectives, stating how we expectthem to recover from softwareerrors, power failures, hardwareand network outages or degrada-tions and data errors.

The heart of this method is toreverse-engineer test cases fromthe set of error messages weexpect the system to generate.We trace each output error mes-sage back to its various causes (as

there may be more that onecause for a particular error mes-sage). Then we create test casesto generate each error message,with one test case for each signif-icant cause.

Degraded Mode Of Operation PatternSystems are designed to use agiven set of resources, such ashardware, networks and databas-es. Their users expect many sys-tems to provide ongoing service,even at reduced rates of perform-ance and capacity, when not allthe resources are working (e.g., adatabase is unavailable). The pur-pose of degraded mode testing isto determine whether the systemcan still provide the reduced levelof service as expected.

An example of a degradedmode test is to deliberately powerdown an application server in aserver cluster with redundantapplication servers and attemptto continue normal operation.

Fault Injection PatternSoftware fault injection is a spe-cialized type of design for testa-bility, to provide the testers withthe capability to easily and safelytrigger or simulate system errorswhich otherwise might be diffi-cult to observe in the test lab butwhich nevertheless may happenin the real world.

Just because these circum-stances are unlikely to occur inlive operation does not meanthey should not be tested.– If theconsequences of these weird,once-in-a-blue-moon circum-stances could be catastrophic,they deserve attention from thetesters.

Despite the similarity ofnames, software fault insertion isdifferent from software faultinjection, which is a way of assess-ing test effectiveness by deliber-ately inserting errors into systemsin an experimental mode.

PART III:Patterns ThatTrigger MeasurableBehavior Response Time PatternThis testing measures how long the sys-tem takes to complete a task or groupof tasks. It usually represents the userviewpoint, i.e., we measure the likelydelay as perceived by an external user.We also can measure the efficiency ofan internal software activity or hard-ware component that is not directlyaccessible by the user. Response time isthe total end-to-end elapsed time,which includes wait time in a queueprior to processing and service time(the actual time to process the requestfor service). Wait time and processingtime both can vary, and may be affect-ed by different factors.

Throughput PatternThroughput testing measures howmuch traffic passes through a systemwithin a specified period of time andunder a specified load. The test loadmay be light, average, heavy or varyover time. We can measure throughputin megabits per second, events (data-base queries, requests or transactions)per second, or another metric.

The selection of the units of meas-ure for throughput can influence thetest results. For example, let’s say thatthe test objective is to rank a group ofcompeting servers from best to worstin terms of throughput. Their rank-ings may be different if we measurethe throughput in megabits per sec-ond than in events per second.

The recorded throughput alsodepends on where in a system we countthe bits or the events – the volumes ofevents usually are not the same at eachinternal point. We can count thethroughput in a network as the amountof traffic which originates from, isreceived at, or passes an internal pointwithin a given period of time. In a sim-ple situation where one specific inputtriggers each output, the count of theoutput traffic received at the destina-tion is exactly the same as the count of

the input traffic. But this ratio can be lessthan 1 to 1, within a given period of time,if there are bottlenecks and inefficiencieswithin the system, or can be higher than1 to 1 if a single stimulus triggers thebroadcast of multiple messages.

There also may be questions aboutwhat to count and how to count it. Forexample, let’s say that a Web server hasan ongoing, low-level flow of administra-tive management messages and errormessages, as well as the “real” traffic,namely the requests from visitors to theWeb site which this server supports. Themeasurers will need to decide which ofthese traffic categories to include in theirthroughput counts.

Availability PatternAvailability is the percentage of uptimefor a system or component, so testingavailability is essentially a process ofrecording when the system is up or down,under both typical and stress workingconditions.

Availability measures can eitherinclude or exclude planned downtime,leading to apples and oranges compar-isons. Another complication in measur-ing availability is that many systems canoperate in a degraded mode if the needarises, e.g., if part of a network is downthe other parts will still function. Duringthis degraded mode, some users mayexperience limited availability.

Resource Utilization PatternMonitoring the levels of utilization ofsystem resources provides insights intohow the system works (which may notbe the same as how its designers think itworks). It helps to identify bottlenecks,assess spare capacity and the potentialfor scalability, and how to improve theefficiency of the system.

The resources and events which aremonitored can include processor activi-ty, use of cache memory and hard diskaccesses, I/O traffic, page swaps,lengths of queues, overflows, numberof ports which are busy, network band-width utilization, and number of con-current software threads or processeswhich are running.

Monitoring the resource utilizationmeans we need access to the system logswhich are recorded by the operating sys-

tem, network management system, anddatabase management system. Plus– andthis is an important plus–we need toknow how to read these logs. Often thenumbers of entries in these logs are sovoluminous that it’s a good idea to usesoftware tools to edit, extract and sum-marize the meaningful information.

Although they are voluminous, theselogs generally do not provide everythingwe need. In addition, we may needhome-built or third-partyplug-in tools to placeprobes into the systemunder test and gather thedata, hopefully withoutmaterially changing thesystem’s performance androbustness characteristics.

Testability PatternAs systems become morecomplex, it becomes moredifficult and eventuallyimpossible to test themadequately unless theyhave been specificallydesigned to be testable.Much of a system's behav-ior may be hidden and notdirectly observable fromthe outside, which severelylimits the effectiveness ofnon-invasive black-box test-ing. For example, an inter-nal buffer overflow may beextremely difficult toobserve in testing or in liveoperation, unless a capability has deliber-ately been designed into the system toprovide this information.

To be testable, a system has to be (a)observable and (b) controllable. A systemis relatively easy to observe if the outputsfrom that system are dependent only onthe inputs, regardless of the internal stateof the system or the state of its supportinginfrastructure. But it’s not easy to test with-out monitoring the internal behavior ofthe system, if the outputs are dependentnot just on the inputs but also on hidden,transient internal states of the system.

Designing systems for testability isoften not done very well. In small, simplesystems, the system architecture is fairlyobvious to the test professionals, andthere is a ready availability of access

points to observe the internal states ofthe system. It is in large, complex systemsthat designing for testability becomesmore important and also, unfortunately,much more difficult.

Usually the main problem is one ofcommunications. Designing for testa-bility requires a solid gray-box under-standing of the system. With many largesystems, the test & QA professionals donot understand the intricacies of the

system architecture. Thishappens because the sys-tems architects have notadequately tutored thetest & QA people, so thatthey do not know how toexploit the gray-box per-spective in testing.

The system architectsmay know the system butare not focused on its testa-bility because they do nothave the perspective of thetest professionals. Test-ability is often a side issueor an afterthought, if it isconsidered by the archi-tects at all. The test & QAprofessionals need to trainand show the system archi-tects what features need tobe built into the system tomake it testable.

A worse situation occurswhen nobody understandsthe system architecture.Sometimes the system we

are testing is an acquired product, wherethe testers do not have access to thedesigners, or the system is integratedfrom many disparate sources with nooverall architect. Or the system is not new– the original architects may no longer beavailable, the system may have beenpatched heavily and the architecture“muddied” over the years.

Capacity Forecasting PatternCapacity is the ability of a system to growor to support an additional work loadwithout degrading performance to anunacceptable degree.

This type of testing aims to measurewhether the allocated resources are suffi-cient for the job, how much spare capac-ity still remains in system for further


•To be

testable,

a system

has to

be both

observable

and

controllable.

•

Pho

togr

aph

by P

atri

cia

Kra

nenb

erg


growth of demand, and at what point inthe growth the resources supportingthe system will need to be upgraded.This is the point at which the responsetime or throughput become unaccept-able as the demand grows.

Performance is mostly non-linear; asthe load increases, the response timemay not increase at all because the sys-tem has ample spare capacity, or mayinstead rapidly approach infinity whenthe system nears the point of saturation.So we usually need to test with normaland peak loads, and with overloads.

Non-linearity complicates theforecasting: a 10 gallon bucket whenpartly filled with let’s say seven gal-lons of liquid still has the sparecapacity to hold another three gal-lons. Since system performance candegrade as utilization increases,effectively a half-full bucket may beconsidered to have no spare capacity.

For example, let’s say a system pro-vides acceptable response times undera given work load, but the CPU and thecomputer’s semiconductor memoryhappen to be fully utilized at this load.In this situation, there is no remainingcapacity for any minor increase in load.

We typically perform capacity test-ing, and testing for related interestssuch as scalability by steadily increasingthe work load on the system and meas-uring the performance at each level ofload, until we reach the point where theperformance becomes unacceptable.

We typically measure the capacity ofprocessors in MHz or GHz, which wealso refer to as the processor speed, oras performed the number of calcula-tions per second. The practical or realprocessor capacity is usually somewhatless than the rated or theoretical capac-ity because of overhead of the operat-ing system. In systems with an appro-priate level of resources and which arewell-tuned, the system designers typical-ly target the processor utilization to bewithin in the range of 40and 65 percentof the rated capacity.

We usually measure the capacity ofdatabases in MB or GB. The practicalor real database capacity is usuallysomewhat less than the rated or theo-retical capacity, because of overheadsand design limitations such as index

files, links between records, and over-flow buffers. Often databases, espe-cially if they have not been de-frag-mented, begin to provide unaccept-able performance when they are onlyabout two-thirds full according totheir rated capacities.

The capacity of a network, whichalso is called bandwidth, we can meas-ure in bits per second, packets persecond, Erlangs or other units.(Erlang is considered to be the fatherof modern queuing theory.) TheErlang is a calculated, dimensionlessmeasure of traffic intensity.

The practical or real capacity isusually somewhat less than the ratedor theoretical capacity, because ofoverheads and design limitations. Forexample, in local area networks,which use CSMA/CD technology (car-rier sense multiple access / collisiondetect) the practical capacity is usual-ly only one-third to one-half of therated or official capacity.

Measurement of Delays PatternTo be able to measure response timesfor particular events, we need toassume a straightforward cause-and-effect relationship: this stimulus trig-gers that outcome. In this situation,we can easily identify the stimulus foreach outcome, and we can measurethe delay from this particular stimulusto its particular outcome.

If we cannot easily link the systemoutcomes to the stimuli, though, weneed a more elaborate measurement(and model) of system behavior thanthe end-to-end response time. Considera situation, for example, where the sys-tem logs and stores a stream of eventsin a file, but takes no action until theaccumulated number of events reachesa certain threshold.

We could reach this thresholdwithin seconds or not for severalweeks, depending on the work con-ditions. In this situation, we may beinterested in three elapsed-timenumbers: (1) from the first event tothe observable outcome, (2) fromthe very last event to the outcome,and (3) the average response time(from the median event to the out-come).

Loss Measurement PatternIn networks especially, losses are a wayof life. In analog networks, signals canattenuate (weaken) and their waveshapes become corrupted. In a con-gested switch, blocking may cause aloss – all ports or connections into theswitch are already busy, and the systemsimply drops an incoming messagewhen the input hopper (buffer) isalready full. In packet-switched digitalnetworks, a data packet can be lost intransit. In the Internet, for example, adata packet is deliberately killed whenthe number of hops which the packettakes from node to node (i.e., fromrouter to router) exceeds a threshold(usually 16 hops), in order to preventthem endlessly circulating within theInternet and royally gumming up theworks. Elaborate facilities have beendesigned into the Internet to take careof these packet losses.

Despite the fact that losses are con-sidered routine, a high rate of loss isunattractive (usually anything morethan a few percent). Every lost packetin the Internet generates at least twomore messages, a request back to thepoint of origin to re-send the lost pack-et, and the re-sending of that packet.

The rate of loss tends to have amajor impact on performance. ANASA study found that a 3 percentloss of data packets in the Internetleads to a 50 percent degradation inthroughput.

Error Rate Measurement PatternLet’s say that the response to a data-base query happens in 0.1 second,but this response says: “Database notavailable”. Or that a Web service canhandle 10,000 users simultaneously,but 500 of these users receive errormessages. Fast response time andhigh throughput are irrelevant if theuser can’t do his job.

Since this type of testing counts theincidence of errors or failures, we needa catalog of errors. Some lists containsitems relevant to system and networkadministrators, such as “race condi-tions: timing out of sync.”, “memoryleaks”, “page locking”, and “processorsaturation,” but which are meaninglessto the end users.


We also need a user-centric list oferrors. We need to distinguish betweensymptoms of failure and causes of fail-ure. Here we will be focusing almostexclusively on the symptoms, not theunderlying causes, and only thosesymptoms which and meaningful to theend users see. Most of the items on theuser error list are not catastrophicerrors (“dark screen,” “dead key-board”), but annoyances to which thesystem administrators may be oblivious.

The next question for the testdesigner is how to observe and cap-ture these user errors. One way is forknowledgeable users to manually testand evaluate the results. Another is touse automated feature-level testingtools. Both have strengths and limita-tions, so it’s generally a good idea toemploy a mix of manual and automat-ed error detection.

Component-Specific Test PatternThis type of testing examines therobustness of one system component(or sub-assembly).

It can be done as soon as the com-ponent is ready, before other compo-nents are built and well before thefully integrated system is ready for test-ing. By examining the behavior of onecomponent in isolation, this testingmakes it easier to isolate and pinpointproblems. And component bugswhich are found earlier can also beeliminated earlier, improving the ini-tial quality of the fully integrated sys-tem when it is delivered for testing.

Component-specific testing mayrequire component test drivers, whichcan be expensive to build.

Calibration or SettingsMeasurement PatternThis type of testing is interleaved withtuning, and its purpose is to providefeedback on the consequences of eachiteration of tuning.

The test work load is kept the same,and typically the testers strive for exactrepeatability of the test run from itera-tion to iteration of tuning.

Scalability PatternThis type of testing investigates a sys-tem’s ability to grow. Growth can occur

in several ways, which we may need toseparately test: increase in the total load;increase in the number of concurrentusers; increase in the size of a database;increase in the number of devices con-nected to a network, and so on.

We can test systems for their abilityto scale down as well as up. For exam-ple, we may be interested in this ques-tion: can the software run adequatelyon a cheaper, slower processor or withless memory?

Compatibility And Configuration PatternThis method considersthe various configura-tions in which a systemcan be used, and howto check for compatibil-ity or consistent behav-ior across these config-urations.

Risk PrioritizationPatternThis method uses a riskassessment to identifyand prioritize the likelyrisks which the systemfaces in live operation.We use this risk assess-ment to allocate testresources to the variousaspects of the system,i.e., to focus the testeffort to the areaswhich need the depthand intensity.

Failure Modes Effects AndAssessment (FMEA) PatternOne of the main objectives of a stress orrobustness test is to see if we can makethe system fail within the relatively safeand controlled confines of the test lab,in order to observe the conditionsunder which the system fails, how it fails(what happens), and whether it recov-ers in an acceptable manner.

Systems can fail in many ways, rang-ing from relatively minor ways such asdropping transactions or failing togive appropriate and timely warnings,to catastrophic interruptions of serv-ice. We care about how a system failsfor two reasons: (1) its behavior in fail-

ure may vary based on how it failed,and (2) it is useful to know how thesystem might fail, in order to knowhow to cause it to fail in testing. A listof the ways in which a system can fail isa useful source of test cases.

The number of ways in which asystem can fail also becomes higherif we broaden the interpretation ofthe word “failure” to mean not onlycatastrophic termination of servicesbut also unacceptable service, suchas failure to meet a service level

agreement (SLA). Indeveloping the robust-ness test strategy, wehave to take a broadview of the system’s pos-sible modes of failure,and not simply test theobvious.

For example, consid-er a non-essential serverwhich occasionally crash-es, but never with anydata loss or corruption,and we can always re-startthe system with minimalfuss and delay. We’d viewthe crash more as aninconvenience than as acatastrophe.

By contrast, manyusers and system admin-istrators have not seenthe presence of incor-rect data values in adatabase as a failure.Creeping database cor-ruption without a crash

has more impact on system qualitythan losing the non-essential server,though, in part because the conse-quences of data corruption are lessimmediately obvious.

In any type of risk-based testing,the testers also can use a list ofpotential problems (i.e., modes offailure) as an important method tofocus the testing efforts. If we can’ttest everything, we should concen-trate on the most significant oppor-tunities: those risks which have thegreatest exposure, or combination oflikelihood of occurring multipliedby the consequences if the failuredoes happen. ý

•The number of

ways that a

system can fail

becomes higher if

we broaden the

interpretation

of the word

'failure.'

•

implementation, and hiring competent and qualified peo-ple with the matching skill set is critical to the success ofAST.

Despite the vendor hype, automation requires morethan simply running a capture/playback tool, and not justanyone can effectively use one. It’s also a bad idea to setyour hiring goals around the lowest labor costs, or to fillmost positions with interns or recent college graduates. Forautomated testing programs to be implemented successful-ly, effective hiring practices need to be in place to attract,select and retain the best-qualified people.

The InterviewerStart by looking at the interviewer’s skills. Are nontechnicalmanagers interviewing for technical positions?Nontechnical managers can evaluate a candidate’s ability towork well with others, their attention to detail and otheradministrative and organizational skills, but can hardly beexpected to recognize a candidate’s ability to perform tech-nical work. While nontechnical managers should be part ofthe interviewing process, a highly technical resource fromthe project should participate in interviews to evaluate the

www.stpcollaborative.com • 23

By Elfriede Dustin and Bernie Gauf

P eople are at the core of any successfulautomated software testing program

Automation DemandsSpecial Skills; Field TheBest Players Possible

Bernie Gauf is president of Innovative Defense Technologies (IDT), aVirginia-based software testing consulting company specializing inautomated testing. Elfriede Dustin works there as an automationexpert.

How to PickA WinningTest Team

Pho

togr

aph

by L

loyd

Sul

livan


candidates’ knowledge of design, soft-ware development and other relevanttechnical areas.

Even if your automation processinvolves a vendor-provided testing tool(i.e. capture/playback), such toolsrequire a degree of software develop-ment skill— some more than others.We recommend that proven softwaredevelopment techniques be used toproduce efficient, maintainable, andreusable automated test scripts andsoftware. Those software techniquesinclude code modularity; loose cou-pling; iterative and incremental devel-opment (repeatable build and test);and development of libraries, to allowfor reuse in various modules.

Granted there are times when asoftware tester wants simply to developautomated testing scripts, for exampleto set up a test environment or to pop-ulate a database with test data. Thequickest way is to do just that. But forany larger AST effort, where automat-ed testing is implemented throughoutthe testing program, other lifecycleconsiderations are required, such asproject schedule, requirements,design and implementation of the ASTsoftware, AST configuration manage-ment of the product baseline, andquality assurance considerationsthroughout the AST implementation.In fact, one of the primary reasonsmany significant automation effort get

in trouble is that they do not treat ASTas a software development project.

The RequirementsA schedule for the AST project shouldbe developed up front and trackedthroughout the project. The testrequirements to be auto-mated need to be devel-oped and verified as withany other software devel-opment project. ASTsoftware needs to bedesigned and developed,and the team is expectedto use good techniquesand practices as well.

The primary differ-ences between the skillsets needed for AST andthose needed for devel-opment of the softwareproduct are that the ASTteam also needs an in-depth knowledge of auto-mated test tools, knowl-edge of the availableautomated software test-ing frameworks andapproaches to imple-menting them, an under-standing of the applica-bility of test techniquesand how to implementan AST program.

The roles and responsibilities andassociated tasks are organized into fivebasic areas, which our experienceshows are key to implementing a suc-cessful AST program. Depending onyour environment, areas might notalways be separated; for example, theCM and/or QA role may be theresponsibility of all involved.

• Program management• Systems engineering• Software development • Configuration management• Quality assuranceOne person may have several skills,

or one task may be covered by severalpeople; mapping skills to people is not aone-to-one exercise. There are also vary-ing degrees of experience. What isimportant is that the AST implementa-tion team collectively has all has skillscovered. But experience also shows thatthe least experienced person shouldnever be put on the most complex task.This might appear to be an overly sim-plistic rule, but it might surprise youhow often we’ve seen it ignored.

‘Other’ SkillsWhile specific technical skills arerequired to implement an automatedtesting program successfully, peopleskills and other intangibles are alsorequired to enhance the chances forsuccess. We call them “other” skills

because few tests or inter-view techniques can pre-dict whether a personwho demonstrates thoseskills is genuine, whereas“technical” skills can gen-erally be determinedusing tests.

Automated testingteam members need tobe adaptable. Experiencehas shown that automat-ed testing needs to beperformed in a variety ofdifferent technical envi-ronments—on differentplatforms, using differenttechnologies, and withexpertise in a variety ofsubject matter areas.Therefore the team mem-bers should be able tolearn quickly, and are ide-ally already familiar witha broad array of new andmature technologies,processes, tools, and

methodologies. It is advantageous ifthe automated test team possess thefollowing “other” skills:

• Be quick learners who enjoy per-forming a variety of tasks, learn-ing new things, and touchingmany different products

• Have an aptitude for conceptual-izing complex activities

• Understand complex test require-ments, be able to formulate testplanning and design approachesto support requirements and beable to perform multiple tasksconcurrently

• Be able to develop work-aroundsolutions to problems encoun-tered during test developmentand execution

• Be creative, with the ability tothink of multiple ways to exercisea system or application to ensurethat it works in all circumstances,and be able to identify all condi-tions under which the software orsystem could fail

• Be diplomatic and discrete, espe-cially when having to report

•Automated

testers need to be

adaptable and

perform in a

variety of

technical

environments.

•

Pho

togr

aph

by N

ame

Kos

tas

Pag

iam

tzis

,200

8

HIRE EDUCATION

major defects, and be able towork closely and effectively with avariety of stakeholders

• Possess strong verbal, written, andgrammatical skills, with the abilityto communicate any issues andarticulate ideas to the cus-tomer/stakeholders

• Be detail-oriented and able toidentify difficult-to-find glitches,and have a strong interest inimproving the quality of a soft-ware product, including beingself-motivated and trying to findall defects possible

• Be process-oriented and skilled inthe ability to understand inputs,logical sequences of steps, andexpected outputs

Sometimes a project team is inher-ited. In this case, evaluate the currentskill sets of the team and then map theskills back to the AST projects in thepipeline to ensure the skills neededare available—before the projectstarts. Then create a gap analysis ofskills needed by the group. You canthen create a training plan to acquirethese skills or hire for them. It isimportant to note that a successfulautomation team consists of a mix ofskills and levels of experience andexpertise.

Preventing Road BlocksWe hear it so often: a good managerdoesn’t solve problems, he preventsthem. The main role of the AST pro-gram manager (PM) is to prevent anyroadblocks, and if they do arise, toremove them in a quick fashion. Aswith the traditional program manage-ment role, he needs to lead the overallAST project, acting as the point ofcontact to the customer (whetherinternal or external), establishingschedule and cost budgets, trackingthe actual schedule and cost againstbudgets, and resolving issues or assess-ing and mitigating program risks. Hisresponsibility includes gathering inputfrom all roles described here to thevarious schedules and processes beingtracked to allow for better decisionmaking. Key skills include good lead-ership and communication skills, plan-ning, the ability to identify and trackmilestones, foresee and prevent issues,as well as a well rounded technicalunderstanding of the project.

The AST PM needs to understandeffective AST processes, including

defect tracking processes. He verifiesthey are implemented and adhered to,and should understand tool trainingrequirements to make the team effec-tive in their use, whether require-ments, defect tracking or automatedtesting tool use. For example, all roleslisted here should be proficient in theuse of any requirements managementtools used on the project.

Table 1 lists some of the responsi-bilities of the program manager forthe first four phases of this process,while the 5th phase “program review”requires the tracking of any defects

that actually slipped into production,and related root cause analysis, to besure this information is tracked forhistorical purposes, to avoid repeatingthe same mistake. Providing SubjectMatter Expertise

AST systems engineering (SE) rolestypically are associated with subjectmatter expertise (SME) regarding theproject or system, development andmanagement of requirements fromthe highest to the lowest priority, pro-viding the technical lead with infor-mation for how the system works andwhy it was designed to work that way,development of the test plan and teststo confirm that the design meets the

requirements, and providing the cus-tomer with a technical point of con-tact.

AST system engineers are also oftenreferred to as the software testers. Anunderstanding of the various softwaretest data design techniques, such asboundary values, equivalence parti-tioning, pair wise or orthogonal arraystesting, is required. Strong communi-cation skills, subject matter expertise,knowledge of how the product or sys-tem is used, the ability to develop anddocument automated test require-ments, and the ability to design tests to

validate requirements are all impor-tant skills for a systems engineer.

Table 2 lists some of the responsi-bilities of the AST SEs for the first fourphases of this process, while the 5thphase “program review” requires forexample, AST PM support and inputto any tracking of defects that actuallyslipped into production, and relatedroot cause analysis, to be sure thisinformation is tracked for historicalpurposes, and again, to avoid repeat-ing the same mistake.

Additionally, it is important to notethat AST SEs support all AST PM activ-ities already described in Table 1, andthat no tasks can be done in isolation

HIRE EDUCATION


TABLE 1: PROGRAM MANAGER DUTIES

AST Phase 1:RequirementsGathering—AnalyzeAutomated TestingNeeds

Project kickoff andorganization; pro-curement of test envi-ronment components,tools required, andmore; coordinatingthe collection andreview of informationneeded (i.e., require-ments, test cases, testprocedures, expectedresults, and interfacespecifications) andknowing the toolsused to do so

AST Phase 2:Test Case Designand Development

Tracking action itemsand issues to beresolved to closure;Understanding thetypes of testing need-ed to be performed,i.e., functional testing,performance, security,concurrency, etc. andassociated test designtechniques; Leadingthe development ofthe automated testingstrategy and obtainingbuy-in from stake-holders; Procuringautomation softwaretest tools and frame-works as required;Understanding thelicensing requirementsrelated to open-sourceuse and the cus-tomer’s expectation

AST Phase 3:Automated SoftwareTesting Framework(ASTF) and TestScript Development

Responsible for defectprevention; inspec-tions, etc. Verifyingthat the ASTF hasbeen tested; Ensuringthat the isolated testenvironment is main-tained and other ASTrequirements areimplemented andready for test execu-tion

AST Phase 4:Automated TestExecution andResults Reporting

Derive the variouseffective testing met-rics; Coordinatingdevelopment of thetest summary presen-tation; Ensuring rootcause analysis is con-ducted as issuesarise, includingfuture preventiontechniques; communi-cating and trackingthe exit criteria


without input from all team members.

Implementing the AST CodeSoftware development responsibilitiesinclude the design and developmentof the ASTF and associated test soft-ware, as well as subject matterexpertise regarding theprocesses, languages, and toolsused during AST develop-ment. Critical skills includestrong communication skills,the ability to understandrequirements (and ask theright questions if the require-ments are ambiguous) andthen design software based onthe requirements. Practition-ers must also possess softwareprogramming skills, and know-ledge of the software develop-ment lifecycle processes.

The AST implementationneeds at a minimum to be able tointerface with the languages and oper-ating systems used for the product tobe tested, so it is very helpful for thesoftware developers to be experiencedwith those same programming lan-guages and operating systems. Thesoftware developers also need to beproficient in the scripting languagesand development languages used onthe project. Finally, they need to be

experts in the use of the automationtools and automation frameworks ordevelopment to help train those thatsupport the types of testing to beimplemented.

Table 2 lists some of the responsibil-ities of the AST software developers forthe first four phases of this process.The 5th phase “program review”requires, for example, AST PM supportand input to any tracking of defectsthat actually slipped into production,and related root cause analysis, to besure this information is tracked for his-torical purposes, and again, to avoidrepeating the same mistake.

Other RolesThe configuration management (CM)role and quality assurance (QA) roleare sometimes separate, but often arealso the responsibilities of everyone onthe team, i.e. that of program manage-ment, systems engineering, and soft-ware development.

CM includes management of allaspects of the product baseline,including version control, productrelease and problem reports.Necessary skills include strong com-munication, subject matter expertisein the technical skills to support ver-sion control and product release,including programming, and profi-ciency with CM tools, problem report-ing tools, and the software develop-ment lifecycle process.

In addition to a familiarity with theCM tools used on the project, the con-figuration manager should be familiarwith the scripting languages and devel-opment languages used on the proj-ect. Some expertise in the use of theavailable automated tools is also neces-sary for the configuration manager tobe able to develop CM procedures andbuild scripts.

Quality assurance provides oversightthat the planned processes are being fol-lowed and the ongoing evaluation of theproduct quality. QA engineers should be

familiar with the software pro-gramming languages beingused on the project. In addi-tion, they should be highly pro-ficient in QA tools and problemreporting tools. Familiarity withthe automated test tools beingused is also needed in order tosupport analysis for which theyare responsible.

Communicating AST RolesOnce the roles and responsibili-ties and related tasks have beendefined, it is important that theybe communicated so that every-body involved understands who

is responsible for what. AST programimplementation is a team effort, and attimes a person can have multiple rolesand be responsible for a variety of tasks.It is important that team members worktoward a common goal, and that it isclear that all are held accountable for thesuccess of the AST program, without anysingle person being held responsible forits success or failure.

Task overview meetings and discus-

TABLE 2: AST SYSTEM ENGINEER DUTIES


Leading the develop-ment of the automat-ed testing strategyusing selected bestpractices as theyapply to the effort,and obtaining buy-infrom stakeholders


Reviewing specifictest cases associatedwith the test require-ments to be automat-ed, if test cases exist;Providing analysis ofthe test requirementsto be automated forwhich there are cur-rently no test cases;prioritize the testcases to be automat-ed; Designing anddocumenting testcases as required;Validating that testcases produce theexpected results;Completing the RTM


Participating inreviews of the ASTFsoftware developedand related automat-ed test cases;Participating inreviews of any soft-ware developed tosupport the automa-tion framework;Developing anapproach and plan fortesting the automatedtest cases; Leadingthe testing of auto-mated test cases;Updating the RTM asrequired


Leading the automat-ed test execution;Collecting, organiz-ing, and reportingtest results (such aspass/fail status);Collecting, organiz-ing, and reportingtest execution timesin support of the ROIcomparison of manu-al to automated test-ing; Leading thedevelopment of train-ing materials neededto implement AST;Leading the develop-ment of the test sum-mary presentation;Supporting any rootcause analysis asissues arise

•QA engineers should be familiar

with the project’s languages,

reporting and QA tools

•

HIRE EDUCATION


sions of roles and responsibilities need tobe held regularly. Training sessionsshould be scheduled to allow for cross-training to make people most effective. Aprocess isn’t valuable unless it has beendocumented, communicated and provid-ed with training. The same is true of atool or automation framework. Tools canend up on a shelf if the tester is nottrained on how to use or maintain it.

AST training efforts can include testtool or automated script training. Forexample, the AST implementation per-sonnel and other stakeholders need toview test automation activities withinthe scope of a separate developmenteffort complete with strategy and goalplanning, test requirement definition,analysis, design and coding. This waythey can learn to appreciate the factthat effective automated test develop-ment requires careful design and plan-ning just like any other software appli-cation development effort.

The outcome of this training is thatall stakeholders understand the ASTphases and milestones, processes, roles,responsibilities and tasks associated withthem. All stakeholders will know how touse the AST-recommended tool suiteand will be familiar with the AST pro-gram and its implementation. Forexample, they would know how to runthe ASTF and automated testing scripts,so anyone on the team could maintain

the automated tool suite, if necessary.The AST team needs to comprise

diverse skill sets, including softwaredevelopment and knowledge of auto-mated testing methodologies and tools.The AST team members cannot beexpected to neatly and precisely fit intoa specific discipline area; sometimespeople have to wear many hats to getthe job done, including each other’shats. But it’s important that all theneeded skills be present on the team.

Figure 1 can be used to define a setof roles and responsibilities for eachteam member and serve as a top-levelassessment of the skill sets of the ASTteam to assess and identify technicalstrengths and weaknesses. We encour-age you to consider additional tasks thatwill need to be completed for your spe-cific AST project. If there are tasks listedfor which no one on the AST team cur-rently has the requisite skills, eithersomeone with those skills should beadded to the team or additional train-ing should be provided for someone onthe team to develop the skills listed.

Only when roles and responsibili-ties have been communicated andtasks are understood can personnel’sperformance be evaluated. ý

TABLE 3: AST DEVELOPER DUTIES


Reviewing and under-standing the SUTsoftware architec-ture, including OSand languages to betested; Identifyingany software testtools already in useduring manual test-ing and evaluatereuse for ASTefforts; Reviewingthe types of expectedresults to be evaluat-ed and the magnitudeof the expectedresults for the testcases and test proce-dures provided;Developing an initialrecommendation forASTF design anddevelopment andrelated automatedtesting tools to beused and identifyingany additional soft-ware that may haveto be developed


Reviewing the com-pleted RTM anddeveloping a strategyfor automating main-tenance of it;Supporting the devel-opment of the auto-mated testing strategy


Designing and devel-oping the software orsoftware modifica-tions to implement theASTF; software skillsand experience needto be consistent withthe languages andoperating system usedby the project;Leading reviews ofany software devel-oped to support theautomation frame-work and test scripts;Participating in thetesting of ASTF andtest case automation;incorporating fixes asneeded; If updated,reviewing the auto-mated RTM


Supporting automat-ed test execution asrequired; Supportingthe development oftraining materialsneeded to implementAST; Supporting thedevelopment of thetest summary presen-tation; Supportingany root cause analy-sis as issues arise

FIGURE 1:TEAM JOB REVIEW SHEET

• Analyzing system requirements for testability

• Deriving test requirements and test strategy

• Determining the SUT automation index

• Evaluating ASTFs and automated test tools

• Verifying automated test tool compatibility with the SUT

• Finding work-around solutions to test tool incompatibility problems

• Planning for test tool introduction on a project

• Planning AST activities

• Identifying the kinds of tests to be performed using effective testing tech-

niques

• Planning, tracking, and managing test environment setup activities

• Developing ASTF architecture and design and related automated test designs

• Knowing best development practices to follow

• Developing test data and refreshing the test database during test execution

• Performing test readiness reviews

• Documenting, tracking, and obtaining closure on trouble reports

• Performing test activities within the same technical environment planned for

the project

• Maintaining the CM tool

HIRE EDUCATION

some ways better than the well-known high-overhead formalbranded initiatives. Organizations can gain considerable advan-tage from simpler alternatives which actually are implementedand provide improvements, especially from low-overhead meth-ods that overcome often-unrecognized weaknesses in their larg-

er and more complex cousins. High-overhead approaches characteristically involve expen-

sive, extensive, extra top-down administrative organizationalstructures to guide the initiative and significant up-front broad-based training of most employees before benefits begin.Ordinarily, the approaches also mandate a number of formalprocedures and paperwork be added to regular workflow.Executive support is essential; and conformance often is enforcedthrough command and coercion, frequently resembling evan-gelical religious fanaticism. For instance, one prominent organ-ization was widely-known for summarily firing not just those

Robin Goldsmith is president of Go-Pro Management, a business consul-tancy specializing in software development and testing efficiency.

By Robin Goldsmith

Low-overhead software process improvement techniques can be faster, cheaper, and in


Perception vs. Reality:Do You Know the Difference?

Pho

togr

aph

by J

ozse

f S

zasz

-Fab

ian


managers who privately questioned theirbranded program, but also those whomerely failed to demonstrate adequateactive public advocacy for it.

Because of the high start-up costs andrelatively long delay until benefits beginaccruing, or perhaps because of con-cerns about the programs’ effectivenessonce underway, many organizationschoose not to undertake these oftenmassive efforts. Moreover, as leadCapability Maturity Model (CMM)author Mark Paulk once said, “Many—perhaps most—software processimprovement efforts fail.”

Generic Process Improvement vs.Process Imposition Many of the big brandedapproaches include often size-able and elaborate models ofpresumably ideal total softwareprocesses. Merely becomingfamiliar with such massiveamounts of material can be a for-midable task, which can pale incomparison to the difficult effortof imposing the model process-es in one’s own organization.Moreover, such overwhelmingshotgun-like blanket changes canamount to overkill, possiblyreplacing existing processeswhich work fine, perhaps withfrankly less appropriate ones, and imple-menting superfluous processes for thingsthe organization doesn’t really need.

On the other hand, generic processimprovement analyzes the current process-es and focuses rifle-like improvement effortsonly on those with opportunities for sig-nificant benefit. Ironically, generic processimprovement is somewhat of a lost art, large-ly because the branded approaches havevirtually usurped the terms “process” and“software process improvement,” though

not necessarily withconscious intent.These days, manypeople think “pro-cess” means massiveformal proceduresand mandated busy-work, typically associ-ated with a particularbranded processmodel, and “softwareprocess improve-ment” means impos-ing those brandedmodel processes onan organization.

Real vs. Presumed ProcessAlthough often not explicitly recognized,implicit in generic process improvementis the need to identify and improve theREAL process. Many (and probably most)generic and branded process improve-ment efforts fail because they don’t prop-erly understand what a process is andtherefore don’t identify the real processthat needs improvement.

Most attendees in my seminars/speeches use expressions such as “Howmuch process do I need?” They define a“process” as “a set of steps one is supposedto follow to turn inputs into a particularresult” or words to that effect. That’s thedefinition of a defined process, and specif-

ically a defined process mandating for-mal procedures and paperwork.

Rather, a process is a set of actions,beliefs and customs that taken togetherproduce a result.

A result is the inevitable outcome ofthe process followed, regardless ofwhether that process was intended,desired, defined or even recognized.

Repeating the process produces essen-tially the same result.

As can be seen in Figure 1, there are

usually two processes. The real or “as-is”process is the process that is actually pro-ducing the results. Often people are notaware of the real process at all. The pre-sumed process is what people think isproducing the results. Often the pre-sumed process is defined, which meansthat different people who are privy tothe definition would describe theprocess in basically the same way. Aprocess can be defined without beingdocumented, although often the processdefinition is in the form of written doc-umentation; and many defined process-es have no formal procedures, or at leastnone involving paperwork. Definedprocesses which may not be writtendown are sometimes known as customs.Again the presumed process is often dif-ferent from the real process, whichalmost always happens when the realprocess is not recognized.

When we want to change our results,we must change our process from the as-is process producing our current resultsto a “should-be” process that will producethe desired changed results (Figure 2).

But which process will peoplechange— the real process they probablydon’t recognize or the presumed processthey think is producing the results?Obviously, they’ll change the presumedprocess; and if the presumed process isdifferent from the real process and is not

producing the current results,changing the presumed processwill not have the desired effecton the results. Therefore, tochange one’s results, one mustchange the real process produc-ing those results; and before onecan change the real process, onemust recognize it. That may notbe easy.

For example, consider the sce-nario described in Figure 3. I’msure you’re familiar with projectsin which coding and testing areplanned to take particularamounts of time and they end at

the stop sign deadline. Then the codingtakes longer than expected. If the amountof necessary testing is proportional to theamount of coding, and the amount ofcoding increases, then the necessaryamount of testing also should increase.

But in reality, the deadline usually staysthe same. Testing gets delayed, squeezedor cut short and the project reliably andpredictably delivers trouble.

Most of us have seen this happen, per-haps even more than once, but if it’s hap-

PERCEPTION VS. REALITY

PresumedDefined, documented, Different

PROCESSRESULT

As Is

Real(unrecognized)

action belief custom

FIG. 1: KNOW WHAT'S REAL...

PresumedDefined, documented, Differentoften partial (Silo)PROCESS

RESULTAs Is

Real(unrecognized)

action belief customChange

ChangeShould Be

FIG. 2: ...AND WHAT SHOULD BE

pening with practically every project, I’dsay your testing department is deliveringmore than its share of trouble.

Yet clearly nobody in your organi-zation thinks or would say that it’s thefault of your software process. Instead,they’d blame the organization’spresumed process: the sched-uled time for coding, followedby the time for testing, followedby delivery on the deadline.When they try to improve,they’ll make changes to the pre-sumed process, which doesn’treally happen, and won’t belikely to change the realprocess’s predictable produc-tion of trouble.

Now that you possess thisknowledge, take a look aroundyour organization and you’ll seecountless examples of unrecog-nized real processes that differmarkedly from presumedprocesses. You should also startunderstanding why your realprocess probably includes repeatedly notbeing able to cost-effectively make appre-ciable improvements despite makinglarge efforts and possibly enlisting well-known high-overhead branded softwareprocess improvement approaches.

Silos There’s a good chance that part of yourreal process difficulties are attributableto silos. Frequently, the presence of silosmakes it especially hard to seethe real process. Silos occurwhen parts of an organization,such as a particular departmentor function, view only a narrowslice of the real process withoutbeing aware of the full process,such as other departments andfunctions. Viewing a slice of thereal process out of contexteffectively creates a presumedprocess view which is differentfrom the full real process.Thus, to get meaningful con-text, the real process by defi-nition must be viewed in itsentirety, from beginning to fullend result.

Figure 4 describes the way most ITorganizations manage development proj-ects. The project is assumed to end at thepoint of delivery, and most organizationstend to have a single measurement datapoint—the deadline.

If yours is like most organizations, this

is pretty much the way things are done,project after project, which makes it yourreal process. Dressing it up with lots ofpaperwork and ceremony doesn’t changethe underlying real process, or it’s pre-dictable problems.

When is the project really done? It’snot when the deadline is reached,because anyone can deliver by a deadlineif it doesn’t matter what they deliver.Rather, the project is really done whenthe project deliverable works right; andeverybody in IT knows that almost alwaysadditional post-delivery effort is neededto make what was delivered at the dead-line work right. You may have a more pro-saic euphemism for these finishing touch-

es, but basically it’s maintenance. Your organization also probably

accounts separately for the “project”through delivery and post-delivery activ-ities, but they don’t tie the fix-its back todevelopment practices that necessitatedthe extra work. That’s a defined proce-dure which institutionalizes a process that

erroneously presumes pre- and post-deliv-ery activities are unrelated and therebyvirtually assures the real process will stayunrecognized and perpetuated.

With practically every endeavor exceptin IT, and certainly any that is recognized

for its effectiveness, the peopleinvolved know that (1) meaning-ful measurement requires morethan a single data point, and (2)that the consequences of behav-ior need to be included in thecosts of that behavior. Forinstance, warranty expenses arecharged to the manufacturingprocess. Making the IT deliveredproduct work right is like war-ranty repairs. The real softwaredevelopment process needs toinclude post-delivery mainte-nance repairs needed to makethe software work right.

If You Don’t Know What You’reDoing,You Don’t Know WhatYou’re Doing

You need to know what your full realprocess activities are, how much you’reexpending on them, and what you’re get-ting for your time and effort. If you don’thave appropriate quantified measures ofyour results and what’s really causingthem, you literally and figuratively don’tknow what you’re doing.

A football coach would be firedinstantly if he didn’t know the score ofthe game and number of yards gained

and lost for each play. Yet, I’dcontend ignorance of compara-ble essential measures is the realprocess in most IT developmentorganizations. No wonder foot-ball coaches get paid in the mil-lions and IT jobs get sent over-seas to the lowest bidder.

Be careful of counting onhigh-overhead imposed process-es to cover your bases. Outside ofIT, quality and effectiveness arejudged by delivered results. Forinstance, it’s widely accepted thata Lexus is a higher-quality carthan a Corolla, which is also man-ufactured by Toyota. It’s the cars

themselves that warrant the distinctions.The Lexus’ superior engineering, mate-rials, and workmanship enable produc-ing the higher quality but are not thehigher quality itself.

Some other automaker conceivablycould be using comparable engineering,materials, and workmanship but still



Scheduled (Presumed)STOP

Actual (REAL)

Coding

Testing

Testing

Coding

TROUBLE

FIG. 3: STOP SIGNS CAN MOVE

•The real software development

process needs to include post-delivery

maintenance to make the

software work right.

•


might not be producing a high qualitycar. Some of the differences might be inthe realm of beliefs and customs, espe-cially as promulgated through manage-ment practices. Companies that repeat-edly produce excellence knowtheir real process includes allthese factors.

On the other hand, the qual-ity of one’s software is not a cri-terion for being assessed high-ly by some of the high-overheadbranded approaches. Rather,assessments presume quality willresult from following their mod-el process procedures and activ-ities. Moreover, the models andassessments generally do notaddress beliefs, or customs,including management prac-tices.

Furthermore, a seldom-rec-ognized dirty not-so-little secretof most branded big ticketimposed processes is a giant loophole—management practices generally aren’taddressed. So, after all the worker beeshave jumped through all the formal pro-cedure busywork hoops, which may infact have been helpful producing bettersoftware, management continues tomake its decisions in the “same-old,same-old” ways.

The real process of these brandedapproaches is the presumption that posi-tion automatically produces wise deci-sions commensurate with the precedingprocess’ disciplines. Perhaps you toohave heard expressions such as, “Shipit, we’ll fix it later. Deadline. Deadline.Deadline.”

Two Essentials Your Real ProcessProbably Is Missing You’ve heard it said that a project isn’tdone until the deliverable works right.But the real process in many IT devel-opment organizations doesn’t adequate-ly define the meaning of “works right,”nor can it sufficiently determine whetherthe deliverable does work right.

In other words, most IT organizationsdon’t adequately know their real businessrequirements and don’t have appropri-ate quality assurance and testing to giveconfidence that their deliverables satisfythose real business requirements.

The big branded models don’t get intothe “how” engineering specifics. That’snot their thrust, which is fine, except thatthey presume dictating that a processshould address requirements and quali-

ty thereby assuring it will be done well.Darn those presumptions.

An alternative low-overhead improve-ment approach that many organizationsuse is simply to bypass “software process

improvement” initiatives and proceeddirectly to implementing presumed“good” practices. That’s even less delayand overhead than with generic softwareprocess improvement. Of course, as with

any imposition, there’s always the pos-sibility that the imposed “good” prac-tice is not really an improvement andcould even be a step backward, which isespecially likely if the organizationimplements and executes the practicespoorly.

On the other hand, implementingsome reasonable “good” practices prob-

ably will pay off for most organizationsif they are complemented proficiently.The highest payoff undoubtedly can beobtained by learning to discover require-ments, since everything else ultimately

depends upon the adequacy ofthose requirements.

Real Business RequirementsUnfortunately, much of the ITindustry’s conventional wisdomis likely to lead astray well-inten-tioned requirements improve-ment efforts. The term “require-ments” ordinarily is used torefer to the requirements of theproduct, system, or software thatis intended to be created. That’sreally high-level design of a pre-sumed way how to presumablyaccomplish the presumed realbusiness requirements whatsthat will provide value when sat-isfied, which seldom have been

defined adequately.

Proactive TestingMost organizations also can achieveimprovement by implementing moreeffective quality assurance and testing.While traditional approaches do gener-ally pay off, few people recognize that tra-ditional testing practices tend to be reac-tive and less beneficial than they couldbe. Such techniques often come too lateand are less effective at identifying testconditions than ordinarily is presumed.

In contrast, the proactive testingmethodology applies a variety of specialtechniques and approaches that antici-pate more of the potential problems sothey can be addressed earlier, when theyare easier and cheaper to fix.

Rather than being resisted as anobstacle to delivery, proactive testingwins advocates among managers, devel-opers, and users who find it actuallyhelps them deliver better systems quick-er, cheaper, and with less aggravation.

Low-overhead generic softwareprocess improvement offers attractivebenefits compared to high-overheadbranded formal process improvementinitiatives. The key is identifying andthen directly addressing only high-pay-back issues within the real softwaredevelopment process. Short-cutapproaches offer likely benefits bybypassing analysis and proficientlyimplementing presumed “good” prac-tices such as discovering real businessrequirements and proactive testing. ý


We need to measure to the end result

Maintenance

Done=WorksRight!

Silo

Anyone can deliver by a deadline if it doesn’t matter what they deliver

Where peopletend to measure

DE

LIV

ER

Requirements

Design

Code, Test

FIG. 4:THE WAY IT DOES IT

•Proactive testing wins

advocates among

managers, developers

and users.

•

Phil Quitslund, senior software architectat tools vendor Instantiations, is happy totalk for hours about testing applicationsbuilt on top of Eclipse. His presentationat EclipseCon 2009 in March was all abouttesting; our timing here at ST&P couldn’thave been better.

When testing applications built on topof Eclipse, two kinds of tests are feasible,he says. Plug-in Development Environ-ment (PDE) tests are run in the contextof a fully bootstrapped and initializedEclipse application, and have full accessto Eclipse platform services. Alternatively,tests can be run outside the PDE wherethey can access platformclasses. But there’s a limita-tion: since the platform isnot fully initialized, manyservices are not available.

“Eclipse testing best prac-tice is to test as much of yourapplication outside the con-text of the PDE as possible,”he says. “The main reasonfor this is that PDE tests takea long time to execute (sincethey require platform boot-strap) and as a result devel-opers are reluctant to runthem continuously.” That,he adds, is not a good thing.

On the surface, thisseems challenging becausefunctionality that leveragesEclipse platform classesoften is not practical to testoutside the PDE. That hap-pens for two key reasons, hesays. First, there are manysingletons (a class that allowsonly a single instance ofitself to be created) in the platform, which,he says are notoriously difficult to testaround. Second, platform classes are hardto fake or mock. These are class-basedrather than interface-based implementa-tions; cumbersome, with wide interfaces

that have lots of methods; for-bidden, in the sense that sub-classing neither provisionedfor nor permitted; and com-plex – think classloader issuesand then check out www.jmock .org/eclipse.html.

But all is not lost. Tomake an Eclipse-based appli-cation test-friendly, onemust strive for a strict isola-tion of the core domain –and the code you want to unit test – fromplatform details. This is essentially tak-ing the classic model presentation, but

spilt to a logical extreme.The good news, accordingto Quitslund, is that thisinsulation can lead to bet-ter factored, more hygien-ic core model code.

Of course, model-testingis not the end of the story.“If you are deliveringEclipse-based apps you needto ensure that your modelcorrectly interacts with theframework,” says Quitslund.For this you need a suite ofintegration tests that inter-act with platform servicesand which need to run inthe context of the PDE. Inpractice, these tests can berun less regularly on devel-oper machines – beforecommitting new code to thesource repository, for exam-ple – and remotely on a testmachine as part of the buildprocess.

So, just what are thequalities of good Eclipse testing, whetherthey are run in the PDE or not? Quitslundidentifies three.

Robustness. Simply put, tests that yourun today should run tomorrow and theday after. In other words, mysterious

intermittent failures areunacceptable.

Maintainability. As theapplication you are testingevolves, your tests should bebuilt in such a way as to bereasonably easy to evolve cor-respondingly. Quitslund’s keyto success regarding main-tainability is to ensure that noduplication exists within thetest code base. Simpler said

than done, and zero duplication, thoughnoble, is more an ideal than a reality. Testcode should be aggressively refactored sothat it contains as little duplication as pos-sible. Common test procedures should bemoved into utility “helper” classes that areused as the units of reuse in test building.

Isolation and Reproducibility. Test fail-ures and successes should be quick andeasy to reproduce again and again. Thekey here is that they should be able to runin isolation and not be dependent onbeing run in the context of any other teststhat put the application in some expect-ed state. “All tests should setup their ownstate and then should tear down as muchof that state as is reasonable,” he says. Putanother way, those tests should strive to beas side-effect free as possible.

In one final observation, Quitslund sayssomething you can read in this columnnearly every month, regardless of the devel-opment platform under discussion. Butit’s still excellent advice. “It is of great ben-efit and it is very important to test earlyand test often. The earlier in the devel-opment cycle you can run tests, the lessexpensive issues it will be to resolve issues.”If only IT budgets and development sched-ules were built with this in mind. ý

Eclipse-Based TestingComes Out of the Shadows

Best Practices

Joel Shore

Joel Shore is a 20-year industry veteran andhas authored numerous books on personalcomputing. He owns and operates ReferenceGuide, a technical product reviewing and docu-mentation consultancy in Southboro, Mass.


•If you are

delivering

Eclipse-based

apps you need to

ensure that your

model correctly

interacts with the

framework.

•

Can we learn from softwaretesting failures to improvethe future of software test-ing? Heathrow’s T5 softwaretesting issues [as reported in“BA: software failures key inT5 fiasco” www.comput-ing.co.uk, May 8, 2008] havehighlighted the importanceof software testing.

I believe that in thefuture, will we see changesin six key areas of software testing:

1. Accurate information provision(in the language of the recipient)Software testing’s role is the provision ofdata to enable go live decisions to bemade. If we get this information wrong,wrong decision will be made. If testingcan’t provide the right information, itactually adds little value.

I believe the way forward for infor-mation provision is actually to learnsomething from the Agile world and startto work and talk to each other.Everyone’s requirement for informationis different, so the closer people mix themore we understand.

2. Building quality in rather thantesting it outSoftware testing’s traditional objective isseen as finding defects, whereas in real-ity it is a great way of preventing defects.

If you knew your back wall was going tofall down, would you wait until it did or dosomething to stop it from falling? The sameshould apply to software testing. If earlyenough in the lifecycle it is identified thatelements of the software may fail, wouldit not be better to stop them from failingby some corrective action than to leavethem to fail and then verify that they have?

3. Ensuring that quality is consid-ered and not seen as a burdenMany have said that you can’t balance

the three sides of the eter-nal project triangle, timecost and quality; that youmay select any two. But byadopting a collaborativeapproach and being able toget involved at the earliestpoint in a project, in myexperience it is possible toachieve all three.

While working for alarge insurance company, I

liaised closely with the developmentmanager—lending him my testing staffso that he could be sure that when hisdevelopers finished developing, that thecode would work. They asked for aweek’s extra time, which I agreed to.When the code was delivered to test, itjust worked. We reduced the lifecycletimescales by six weeks and saved morethan £400k. The testing cost roughly thesame as the back ended budget, but wasspread out from a far earlier point, andall of the savings were realized by devel-opment not having to support an extend-ed testing window.

The key to this success was the jointownership of the quality between all partsof the project team. So by focusing on qual-ity early in the lifecycle, you can meet thejoint objectives of cost, time and quality.

4. Software Testing QualificationsA good software tester or test managerneeds a mixture of skills and knowledgeto make them successful. In my view, thequalifications account for about 10 to 15percent of the knowledge required for thepractical application in the workplace.

ISEB has recently revised its singlepractitioner exam into three parts: theintermediate, test management and testanalyst. But it only prescribes a three-daytraining course followed by an exam.There is a minimum entry requirement,but it does not include practical test expe-rience, only the passage of time after tak-

ing the foundation exam.ISTQB on the other hand, has just

launched its advanced level qualifications,also with three levels: Test Manager, TestAnalyst and Technical Test Analyst. There’sa minimum training period of five dayseach followed by an exam. And while Ibelieve that software testing qualificationsplay an important part in the educationand verification of software testers and testmanagers, the reality is that qualificationtogether with real practical experience iswhat makes a good tester.

5. StandardsIt amazes me that even knowing that test-ing standards exist, testers prefer to startfrom a blank sheet of paper when con-structing their processes and templates.The few standards currently out thereare developed by a wide working inter-national group (can be as many as 1000contributors) who build and review thecontent. However, by necessity they aregeneric and some concessions have to bemade to gain global agreement.

They may not be perfect, but to startfrom a point above zero has to be bet-ter than no help at all. I believe we needto embrace the standards more and takefrom them the information that is rightfor the project.

6.Test Process Measurement ModelsThere are many test process measure-ment models available against whichtesters can review what they do. In myopinion, the most effective model is theTest Maturity Model Integrated (TMMi).It is a model that has been developed outof research (rather than commercialinterest) and therefore provides an inde-pendent and structured view.

It works along the same lines as CMMi,providing 5 levels of maturity to measureprocesses against. This model has beendeveloped by the TMMi Foundation(www.tmmifoundation.org), an interna-tional organization whose aim is to estab-lish a non commercial test process assess-ment model for the industry. ý

Future Test


Can Failure AlterTesting’s Future?

Geoff Thompson

Future Test

Geoff Thompson is consultancy director atExperimentus, an IT solutions and servicescompany that specializes in software qualitymanagement, and a founding member of theTMMi Foundation; Thompson serves on itsboard.

On Another Issue of The

Test & QA ReporteNewsletter!

Each FREE biweekly issue includes original articles that interview top thought leaders in software testing and

quality trends, best practices and Test/QA methodologies.Get must-read articles that appear

only in this eNewsletter!

Subscribe today at www.stpmag.com/tqa

To advertise in the Test & QA ReporteNewsletter, please call +631-393-6054

[email protected]

DDoonn’’ttMMiissss OOuutt

volume 6 • issue 6 • june 2009 • $8.95 • · keep your dominos in a row a veteran of...

Documents