systematic unit testing in a read-eval-print loop€¦ · contrast to systematic unit testing [beck...

Systematic Unit Testing in a Read-eval-print Loop

Kurt Nørmark(Department of Computer Science

Aalborg UniversityDenmark

[email protected])

Abstract: Lisp programmers constantly carry out experiments in a read-eval-printloop. The experimental activities convince the Lisp programmers that new or modifiedpieces of programs work as expected. But the experiments typically do not representsystematic and comprehensive unit testing efforts. Rather, the experiments are quickand dirty one shot validations which do not add lasting value to the software, which isbeing developed. In this paper we propose a tool that is able to collect, organize, andre-validate test cases, which are entered as expressions in a read-eval-print loop. Theprocess of collecting the expressions and their results imposes only little extra work onthe programmer. The use of the tool provides for creation of test repositories, and it isintended to catalyze a much more systematic approach to unit testing in a read-eval-print loop. In the paper we also discuss how to use a test repository for other purposesthan testing. As a concrete contribution we show how to use test cases as examples inlibrary interface documentation. It is hypothesized—but not yet validated—that thetool will motivate the Lisp programmer to take the transition from casual testing tosystematic testing.

Key Words: Interactive unit testing, program examples, Emacs, Scheme program-ming.

Category: D.2.5, D.2.6, D.1.1.

1 Introduction

This paper is about systematic program testing done in a read-eval-print loop.A read-eval-print loop is also known as an interactive shell or a command inter-preter. The paper deals with unit testing of Scheme functions [Kelsey 98], butthe results of the paper are valid for any Lisp language, and beyond. Using aread-eval-print loop it is natural to try out a function immediately after it hasbeen programmed or modified. It is easy to do so via a few interactions. Thistrying out process can be characterized as casual testing, and it stands as acontrast to systematic unit testing [Beck 94, Beck 98]. In this paper we describehow to manage and organize the testing activities which are done in a read-eval-print loop. It is hypothesized that better means for management and utilizationof test cases will encourage programmers to shift from a casual testing mode toa more systematic testing mode.

In order to understand the context and the overall problem, which is inde-pendent of Lisp programming, we start with a brief discussion of different ways

Journal of Universal Computer Science, vol. 16, no. 2 (2010), 296-314submitted: 19/10/09, accepted: 10/12/09, appeared: 28/2/10 © J.UCS

to execute programs. There are, in general and in essence, two different ways toexecute a part of a program:

1. Single entry point execution.Program execution always starts at a designated place called the main pro-gram, typically a function or a method called main. The main program isthe only entry point to the program. The only way to execute a given partP of a program is to arrange that P is called directly or indirectly from themain program.

2. Multiple entry points execution.Any top-level abstraction in the program serves as an entry point. Thus, if aprogram part is implemented as a top-level abstraction, the abstraction maybe executed (called/instantiated) from a read-eval-print loop.

The work described in this paper relies on environments that support multipleentry points execution. Almost all Lisp environments belong to this category, andmany environments for languages such as ML, Haskell, F#, Smalltalk, Ruby, andPython also support multiple entry points execution. The initial observationbehind this work is the following:

Programmers who work in multiple entry points IDEs are privileged be-cause once a top-level abstraction has been programmed it can be executedright away in a read-eval-print loop. It is hypothesized that programmersin such IDEs test non-trivial program parts incrementally and interac-tively in this manner. A problem is, however, that these test executionsare sporadic and casual, not systematic. Another problem is that the test-executions are not collected and preserved. Hereby it becomes difficult torepeat the test executions if/when an underlying program part has beenmodified (regression testing). The latter problem probably amplifies thefirst problem: What should motivate a programmer to perform systematicand more comprehensive “one shot testing”—tests which cannot easilybe repeated when a need arises in the future?

One way to approach these problems would be to collect and systematizethe tests in the direction of single entry point execution. Thus, instead of doinginteractive and incremental program testing, the programmer aggregates the in-dividual tests in test abstractions. Such test abstractions are eventually initiatedfrom a single entry point, possibly via a “smart test driver” in the realm ofJUnit [JUnit 09] or NUnit [NUnit 09]. In our opinion this is the wrong way togo, because it will ultimately ruin the interactive, experimental, and exploratoryprogramming process which is so valuable for creative program development inLisp and similar languages.

297Noermark K.: Systematic Unit Testing in a Read-eval-print Loop

As an alternative approach, we will outline support for systematic and in-teractive testing using a read-eval-print loop. The idea is that the programmercontinues to test-execute—try out and play with— abstractions as soon as itmakes sense. Following each such test execution the programmer should decideif the test case is worth keeping for future use. If a test case should be pre-served, the tested fragment and its result(s) are enrolled in a test bookkeepingsystem—a test case repository. Test cases enrolled in the system can easily bere-executed, and they can be conveniently reused for other purposes.

In the rest of the paper we will describe a concrete tool, through which wehave gained experience with systematic unit testing of Scheme programs in aread-eval-print loop. This will show how it is possible to manage the interactivetest-execution in a systematic way.

The contributions of our work are twofold. The first contribution is the ideaof systematic testing via a read-eval-print loop, the necessary support behindsuch a facility, and additional derived benefits from the approach. The second isthe actual testing tools for Scheme, as implemented in Emacs Lisp and hostedin the Emacs text editor.

The rest of the paper is structured as follows. First, in Section 2, we discussthe idea of collecting test cases in a read-eval-print loop. Next, in Section 3 weintroduce support for test case management. In Section 4 we describe how to useaccumulated test cases as examples in library interface documentation. Section5 describes the implemented tools behind our work. Related work is discussedin Section 6, and the conclusions are drawn in Section 7.

2 Collecting Tests in a Read-eval-print Loop

The slogan of test driven development [Beck 98], which is an important part ofextreme programming [Beck 04], is “test a little, code a little”. Following thisapproach the test cases are written before the actual programming takes place.In a less radical variant, “code a little, test a little”, test cases are written aftera piece of code has been completed.

For Lisp programmers, who work incrementally in a read-eval-print loop,the slogan has always been “code a little, experiment a little”. We do not wantto change that, because this way of working represents the spirit of dynamic,exploratory, interactive programming. It is also consistent with Paul Graham’snotion of bottom-up programming in Lisp [Graham 93]. We will, however, pro-pose that a subset of the experiments are turned into a more systematic andconsolidated testing effort. In addition we will propose that the result of thiseffort is organized and preserved such that regression testing becomes possi-ble. With appropriate support from the read-eval-print loop this only requires aminimum of extra work from the programmer.

298 Noermark K.: Systematic Unit Testing in a Read-eval-print Loop

Figure 1: A Scheme read-eval-print loop (bottom) and part of Scheme source

program (top). The menu entry Unit Testing in the top-bar pertains

to systematic unit testing.

As a typical and simple setup, a programmer who uses a read-eval-print loopworks in a two-paned window, see Figure 1. In one pane a program source file isedited, and in the other pane the read-eval-print loop is running. (Variations withmultiple editor panes, multiple read-eval-print loops, or combined editor paneand read-eval-print loop pane are supported in more advanced environments).Whenever an abstraction, typically a function, is completed in the upper paneof Figure 1, it is loaded and tried out in the lower pane.

In order to be concrete, we will assume that we have just programmed theScheme function string-of-char-list?, as shown in the upper pane of Figure1. The function string-of-char-list? examines if the string str consists ex-clusively of characters from the list char-list. We load this function into theread-eval-print loop, and we assure ourselves that it works correctly. This can,for instance, be done via the following interactions.

> (string-of-char-list? "abba" (list #\a #\b))#t> OK

> (string-of-char-list? "abbac" (list #\a #\b))#f


> (string-of-char-list? "aaaa" (list #\a))#t> OK

> (string-of-char-list? "1 2 3" (list #\1 #\2))#f> OK

> (string-of-char-list? "1 2 3" (list #\1 #\2 #\3))#f> OK

> (string-of-char-list? "1 2 3" (list #\1 #\2 #\3 #\space))#t> OK

> (string-of-char-list? "cde" (list #\a #\b))#f> OK

The OK commands, issued after most of the interactions, are test case acceptancecommands which signify that the latest evaluation is as expected, and that theexpression and its value should be preserved. The command can be issued in anumber of different ways depending on the style of the tool and the preferences ofthe programmer. In our concrete, Emacs-based tool we support test case accep-tance commands both via a menu entry and via a terse control sequence (such asC-t C-t). Issuing the test acceptance command asserts that the returned valueis equal to the value of the evaluated expression, using the default assertion (theequal? Scheme function in our setup).

Working in a testing context, we will turn experimentation into a more sys-tematic testing effort. Continuing the example, it will make good sense to con-solidate the test with test cases that care about extreme values of the text stringand the character list:

> (string-of-char-list? "" ’())#t> OK

> (string-of-char-list? "" (list #\a #\b))#t> OK

> (string-of-char-list? "ab" ’())#f> OK

As it can be seen, the programmer accepts all these interactions as test cases.It is also possible to accept an error test case. As an example, the read-eval-

print loop interaction

> (string-of-char-list? #\a (list #\a))


string-length: expects argument of type <string>; given #\a> ERROR

is categorized as an error, because string-of-char-list? only accepts a stringas its first parameter. The ERROR command signals that an error must occurwhen the expression from above is evaluated.

In the spirit of test-driven development it is also possible—in a read-eval-print loop—to write test cases before the function under test is implemented.Using this variant, it is necessary to provide the expected value of an expression,which cannot yet be successfully evaluated. Here follows a possible interactionin case the function string-of-char-list? has not yet been implemented:

> (string-of-char-list? "abba" (list #\a #\b))reference to undefined identifier: string-of-char-list?> VALUE: #t

> (string-of-char-list? "abbac" (list #\a #\b))reference to undefined identifier: string-of-char-list?> VALUE: #f

The VALUE: commands are issued with the purpose of providing the intendedvalue of the expressions which fail.

The approach described above will be called interactive unit testing. Inter-active unit testing works well for testing of pure functions. The reason is that atest of a function f can be fully described by (1) an expression e that invokes fand (2) the value of the e.

The interactive unit testing approach is less useful for testing of procedures orfunctions with side effects. A test of a procedure p involves, besides step 1 fromabove, (0) establishment of the initial state, and (3) possible cleaning up afterp has been called. Non-interactive unit testing frameworks typically prepare forstep 0 and 3 via setup and teardown actions for each individual test case. It is noteasy, nor natural, to collect these four pieces in a read-eval-print loop. Thereforewe recommend that test cases for imperative abstractions are captured at thelevel of more conventional, non-interactive unit testing. This may harmonizebetter than expected with interactive unit testing, because a traditional unittesting tool is in fact part of the tool-chain of the proposed implementation, seeSection 5.

3 Test Case Management

In the previous section we have described the basic idea of turning selectedexpressions and values, as they appear in a read-eval-print loop, into test cases.We will now describe how our tool keeps track of these test cases, and howthe program developer can deal with the accumulated test cases. We start with


an explanation of test case management that involves pure functions. At theend of this section we will touch on management of test cases in imperativeprogramming.

3.1 Basic test case management

A test case of a pure function is composed of the following information:

1. The expression which serves as the calling form.

2. The expected value of the expression.

3. The predicate that is used to verify that the expected value is equal to theoutcome of executing the expression.

4. A unique id, which contains a time stamp.

5. Additional information which, for instance, pertains to the use of the testcase as an example in interface documentation (see Section 4).

In case the expression is supposed to give rise to an error, the test case reflectsthis as a special case.

The existence—and acceptance—of a test case reflects the fact that the testcase represents correct behavior of the involved abstractions. The acceptanceof the test case is made by the programmer by issuing a test case acceptancecommand in the read-eval-print loop, just after the evaluation of the expression(see Section 2).

The basic test management idea is simple. A given session with a read-eval-print loop can be carried out relative to a designated test case repository. Atan arbitrary point in time the programmer can connect to an existing test caserepository, or the programmer can ask for a new one. Subsequent test case ac-ceptance commands add test cases to the designated repository. If a programmeraccepts a test case without being connected to a repository, the programmer isprompted for a repository for this particular test case. It is possible, at any pointin time, to switch from one test case repository to another.

Each test case repository—corresponding to a test-suite—can associate asetup file and a tear down file. The setup file may contain a common context ofall test cases, and as such the context is shared between all the test cases in therepository. The setup file typically also contains some instructions for loading ofappropriate source files. The tear down file is typically empty.

It is up to the programmer to decide the relationship between the programsource files and test case repositories. It is our experience that it is natural tohave one test case repository per library source file, and one per applicationsource file. It is possible to register one or more source files in association with a


test case repository. This registration makes it possible to locate a given functionin a source file, e.g. in case it is necessary to correct an error.

When the programmer asks for a new test case repository, he or she isprompted for a hosting directory hd and a name n of the repository. This will,behind the scene, create a directory called n-test in the directory hd. In thevicinity of the interactive unit testing software we keep a list (in a file) that holdsinformation about all registered test case repositories. A new test case repositorydirectory contains a file with all the test cases together with the setup and tear-down files. In addition a number of temporary files for various purposes are keptin the directory. The programmer does not need to be aware of the test-relateddirectories.

3.2 Unit testing functionality

The test enabled read-eval-print loop supports a rich collection of functionality.The functionality is organized in the ‘Unit Testing’ menu of the read-eval-printpane, see Figure 1. The menu appears automatically when the supported read-eval-print loop runs together with our interactive unit testing tool. The ‘UnitTesting’ menu itself is shown in Figure 2. We will now describe the most impor-tant unit testing functionality of the read-eval-print loop. The description willbe structured relative to the grouping of the menu entries in Figure 2.

The first group of functionality pertains to ‘connecting to’ and ‘disconnectingfrom’ a test suite, as described in Section 3.1. A test suite is contained in atest case repository. In the next group there are commands for registration andderegistration of test suites. In addition to registering and deregistering of asingle test suite, it is possible to deregister all test suites. The tool is also ableto search through a directory tree for all test case repository directories, and toregister these. This is useful if test cases are brought in from external sources,such as another development machine.

As another category of functionality, it is possible to execute all test cases inthe current test case repository from the read-eval-print loop. It is also possibleto execute all test cases in all registered repositories. In these ways, execution ofall test cases can easily take place on a regular basis.

The current test case repository can be opened for examination and editing.It is, in addition, possible to inspect and edit the setup file, the teardown file,and a selected source file under test. In also possible to inspect the Scheme filethat holds all the low-level unit tests (aggregated automatically by the tool), aswell as a test report in which test errors and test failures are reported.

The next two groups of commands add test acceptance commands. Thesecommands correspond to the OK, ERROR, and VALUE: commands described inSection 2. The two source file info commands in the following group make itpossible connect a test suite to a set of Scheme source files. With this connection


Figure 2: The unit testing menu entries of the read-eval-print loop illustrated in

Figure 1.

it is possible to provide for smooth navigation from a test case to the functionunder test.

Our tool preserves all entered input to the read-eval-print loop, independentof the built-in history list, and independent of the test cases in the repositories.The two comint history command are related to this functionality. With this fa-


cility, it is relatively easy to reproduce the interactions “from yesterday” or from“last week”, in case these interactions contain contributions that we “today”want to organize in test case repositories. The command ‘Document current testsuite’ pretty prints an aggregate consisting of the setup file, all test cases, andthe teardown file to a web page. The command ‘Default windows setup’ normal-izes the panes to show the read-eval-print loop and the Scheme source file thatcontains the functions under test.

The commands in the last group are related to information and help.

3.3 Maintenance of unit tests

During maintenance of the software under test, regression testing may sooner orlater reveal failures or errors. A failure occurs if a function under test becomesinconsistent with a test case. An error occurs if, for instance, a tested functionis deleted or renamed. Test case failures and errors are revealed in reports of thefollowing form:

...Error:------17395-10319an error of type exn:variable occurred with message:"reference to undefined identifier: blank-string?"

Error:------17395-10394an error of type exn:variable occurred with message:"reference to undefined identifier: blank-string?"

Failure:--------19240-61944/lib/general-test/testsuite.scm:1140:4

name: "assert"location: ("/lib/general-test/testsuite.scm" 1140 4 59367 107 #f)expression: (assert equal?

(string-of-char-list? "1 2 3" (list #\1 #\2 #\3))(quote #t))

params: (#<primitive:equal?> #f #t)...

The reported problems may be caused by a renaming of the function blank-

string? and by a change of string-of-char-list?. The numerical tokens fol-lowing the six dashes are the unique test case ids. Forward and backward nav-igation between test case ids are provided. It is also possible to access the testcase from this id, including the tested expression and the expected value. In ad-dition, it is possible to delete the test case via a test case id. As a special twist, atest case can be automatically transferred to the read-eval-print loop just beforedeleting it. In that way erroneous test cases can be corrected and re-accepted.


Read

-eval-prin

tL

oop

Test R

eport

Test S

uiteS

ource

File

AP

I Docum

entation

Run test suite

Executetest case via test case id

Access to test case via test casid

Inspecttest suite

Find selectedfunction

Find functionunder test

Make

API D

ocumentation

with

examples

Evaluatecurrenttest case

Figure

3:T

he

read-eva

l-prin

tlo

op

and

oth

ertest-rela

tedpanes,

togeth

erw

ith

the

intera

ctive

relatio

ns

(com

mands)

that

tieth

emto

geth

er.

InFigure

3w

eshow

adiagram

thatillustrates

theread-eval-print

loopto-

getherw

ithtest

reportand

otherim

portanttest-related

editorpanes.T

hecon-

nectionsbetw

eenboxes

inthe

figureillustrate

test-relatedinteractions.

3.4Im

perative

testcases

Each

testcase

repositoryalso

containstest

casesfor

procedures(or

functionsw

ithside-effects).

Test

casesthat

involveim

perativeprogram

ming

arehandled

atthe

levelof

theunderlying

unit-testingtool.

Itis

possibleto

adda

low-level

testcase

tothe

testcase

repository,and

itis

possibleto

editthe

collectionof

suchlow

-leveltestcase.In

addition,itis

possible—in

aflexible

way—

toexecute

alow

-leveltest

casein

orderto

eliminate

potentialerrors

inthe

testcase

atan

earlypoint

intim

e.D

ueto

thesim

plenature

oftest

casesthat

involvepure

functionsit

isunlikely

thatthere

will

beerrors

insuch

testcases.

Test

casesthat

involveprocedures

andim

perativeprogram

ming

arem

orecom

plex,and

thereforeit

isvaluable

totry

themout

inisolation,as

earlyas

possible.

306N

oermark K

.: Systematic U

nit Testing in a R

ead-eval-print Loop

Figure 4: Use of the string-of-char-list? test cases as examples in the inter-

face documentation of the function.

4 Test and Documentation

The primary benefit of an interactive unit testing tool in a read-eval-print-loop,as described in Section 2 and 3, is easy and natural collection and preservationof test cases. In addition, an interactive unit testing tool allows for convenientmanagement and execution of the collected test cases.

A clean representation of test cases in repositories allows for additional ap-plications of the test cases. One such application is use of selected test casesas examples in documentation of libraries. The documentation we have in mindis interface documentation, as produced by tools such as Javadoc [Friendly 95],Doxygen [van Heesch 04] and SchemeDoc [Nørmark 04].

As the name suggests, interface documentation describes the application pro-grammer’s interface (API) of program libraries. Interface documentation is typi-


cally extracted from function headers/signatures and from the text in designateddocumentation comments. Examples of usages of the documented abstractions(functions, for instance) are very useful for library client programmers, becausesuch examples are close to the actual needs of the programmer. Unfortunately,it is time consuming and error prone to write examples that cover the use of alarge collection of functions in a program library.

Test cases in test case repositories can be used directly as examples withininterface documentation. Given some function f in the library, which we areabout to document, there should be means to select a number of test cases whichcan serve as examples of using f. It is a natural and necessary requirement thatf should be represented in the expression of such test cases. The test case withan expression such as

(string-of-char-list? "123" ’(#\1 #\2 #\3))

is probably a worthwhile example of the function string-of-char-list? in itsdocumentation, whereas the test case with the expression

(string-of-char-list? "123" (map as-char ’("1" "2" "3")))

is less likely to add to the documentation of the functions map and as-char. Foreach test case tc it is possible (but not necessary) to enumerate the functionsfor which tc should act as an example. Without the support of such a selectivemechanism it is our experience that some test cases show up at unexpectedplaces in the interface documentation.

The quality of examples derived from test case repositories can be expectedto be higher than the quality of examples produced by most other means. Theexamples extracted from test case repositories are executed on a regular basis,as part of regression testing. Therefore we can be confident that the expressionsare lexically and syntactically correct, and that the results displayed are correctrelative to the current implementation of the underlying software.

In order to make use of test cases as examples in interface documentation,it is necessary to provide an interface between the test case repository and theinterface documentation tool. The interface documentation tool must be able toread selected test case repositories, it must understand the actual representationof test cases, and it must be able to select and extract test cases which arerelevant for a given entry in the documentation.

5 The Implemented Tool

The implemented unit testing tool supports interactive unit testing of Schemelibraries and Scheme programs. More specifically, the tool works with MzSche-me [Flatt 00] (R5RS) which is part of PLT Scheme and DrScheme [Findler 02].


The interactive unit testing tool interfaces to SchemeUnit [Welsh 02], which isa conventional unit testing framework for PLT Scheme. It means that the toolgenerates test expressions in the format prescribed by SchemeUnit, such thatthe actual test executions can be done by SchemeUnit.

The read-eval-print loop of our tool is Olin Shivers’ and Simon Marshall’scommand interpreter—comint—which is part of GNU Emacs. Our interactiveunit testing tool is implemented in Emacs Lisp, as an extension to comint. Asshown in Figure 1 and Figure 2 the tool manifests itself via an extra ’Unit Test-ing’ menu of the Emacs command interpreter, and via a number of interactiveEmacs commands and associated key bindings.

The comint command interpreter has been augmented with a number ofcommand, as discussed in Section 3.2. The test case reports, as generated bySchemeUnit, have been associated with an Emacs mode. Via this mode, a numberof commands have been implemented on test case ids, see Figure 3. Similarly, thepresentations of test suites have been associated with another Emacs mode thatdefines a number of useful commands on individual test cases. These commands,taken together, bind the facilities of the interactive testing tool together, andthey catalyze a smooth test-related workflow from within Emacs.

The test cases produced by our tool can be referred and applied by LAMLSchemeDoc [Nørmark 04], which is an interface documentation tool we havecreated for the Scheme programming language. The screen shot in Figure 4 showsan excerpt of some SchemeDoc interface documentation produced on the basis oftest cases similar to those discussed in Section 2. In the LAML software package[Nørmark 09] it is possible to consult examples of real-life API documentation,such as the “General LAML library”, that contain many examples. Far themajority of these examples stem from interactive unit testing.

6 Related Work

An early paper about unit testing [Beck 94] is written in the context of Smalltalk.In the introduction to this paper Kent Beck states that he dislikes user interface-based testing. (In this context, user interface-based testing is understood astesting through the textual or graphical user interface of a program). The ob-servation of user interface-based testing motivated a testing approach based ona library in the programming language. This is the testing approach now knownas unit testing, and supported by xUnit tools for various languages x. The workdescribed in this paper makes use of one such unit testing tool, SchemeUnit[Welsh 02], for the programming language Scheme.

JUnit [JUnit 09] and NUnit [NUnit 09] represent well-known and widely usedunit testing tools for the Java and .Net platforms respectively. In the context ofthis paper, JUnit and NUnit are classified as non-interactive unit testing tools.


The individual test cases supported by JUnit and NUnit are written as parame-terless test methods in separate classes. The primary contribution of JUnit andNUnit is to activate these test methods. This is done by single entry point ex-

ecution, as described in Section 1, via an aggregated Main method generatedimplicitly by JUnit and NUnit.

RT is an early regression testing tool for Common Lisp, programmed byRichard C. Waters, and described in a Lisp Pointers paper [Waters 91]. Alongthe same lines as RT, Gary King describes a Common Lisp framework for unittesting called LIFT [King 01]. LIFT is based on a CLOS representation of testcases and test suites. A number of other unit testing tools exists for CommonLisp. Among these, CLUnit [Adrian 01] was designed to remedy some identifiedweaknesses in RT. Stefil [Stefil 09] is a Common Lisp testing framework, whichcan be used interactively from a read-eval-print loop. Stefil provides a numberof macros, such as deftest and defsuite, which act as instrumented functiondefinitions. Test cases defined in Stefil must to embedded in the test abstractionsdefined by these macros. In our work no such test abstractions are necessary.Instead, special commands supported by the read-eval-print loop are responsiblefor the management of the test cases.

The Python library called doctest [Doctest 09] makes it possible to representtest cases in the documentation string of definitions. The test cases can be copiedand pasted directly from an interactive console session, corresponding to a read-eval-print loop. The Python doctest facility is able to identify the test caseswithin the documentation string, to execute the test cases, and to compare theactual result with the original result. In the documentation of the Python doctestfacility, the approach is incisively described as “literate testing” or “executabledocumentation”. The Python doctest facility stores the test cases inside thedocumentation strings, and therefore the documentation-benefit of the approachis trivially obtained. Following the Python approach, pieces of interactions mustbe copied from the interactive console to the documentation string. No suchcopying is necessary in our interactive unit testing tool. Our representation ofthe test cases is more versatile than the proposed Python approach. In ourapproach it is straightforward to use a single test case as an example in thedocumentation of several function definitions.

Some work similar to the Python doctest library has also been brought up inthe context of Common Lisp. Juan M. Bello Rivas [Rivas 09] describes a doctestfacility in which test cases, harvested from a read-evel-print loop, is representedin the documentation string of a function. A function called doctest is able tocarry out regression testing based on this information.

An important theme in the current paper is convenient capturing of test cases.We have proposed that test cases are captured when new or modified functionsare tried out in a read-eval-print loop. In the paper Contract Driven Development


= Test Driven Development - Writing Test-Cases [Leitner 07] it is proposed tocapture test cases when a contract is violated. The work in the paper is carriedout in the context of Eiffel and Design by Contract. In the same way as in ourwork, it is assumed that a given abstraction (a method) is explicitly tried out.In a single entry point program this may require an explicitly programmed testdriver. The contribution of the paper—and the tool behind it—is to collect testcases (including the necessary context) which leads to violation of the contractsin programs. The collected test cases can be executed again, even without theprogrammed test driver. It is hypothesized that the semi-automatically collectedtest cases are useful for future regression testing.

In a couple of papers Hoffman and Strooper discuss the use of test casesfor documentation and specification purposes, [Hoffman 00, Hoffman 03]. Testcases for Java are written in a particular language, similar to (but different from)JUnit. The approach of the papers is to use such test cases (examples) togetherwith informal “prose” as API documentation. In the papers the combinationof informal prose descriptions and concisely formulated test cases is seen as analternative to formal specifications (such as contracts in terms of preconditionsand postconditions). In contrast to the current paper, the papers by Hoffmanand Strooper do not propose how to integrate the informal prose descriptionswith presentations of the test cases.

In a paper about Example Centric Programming Edwards discusses someradical ideas that unify programming editing, debugging, exploring, and testing[Edwards 04]. The ideas are accompanied by a prototype tool which can be seenas a forerunner of the more recent Subtext system [Edwards 05]. In the paperit is discussed how to capture unit tests more easily than in traditional unittesting tools, such as NUnit and JUnit. The paper states directly that “Unit testcan serve as examples”. In the current paper this is taken to mean examples inAPI documentation, whereas in Edwards’ work, examples are more akin to stacktraces known from conventional debuggers. The idea in Edwards’ work is to freezea given method activation and its result—as they occur in an example pane—asan assertion. This is similar to our idea of using and registering an expressionand its result—captured during experiments in a read-eval-print loop—as a testcase.

7 Summary and Conclusions

The observation and starting point of this paper is that most Lisp program-mers—and programmers who use similar languages—experiment with new ormodified pieces of programs. The languages in the Lisp family allow multipleentry point execution, and they make use of read-eval-print loops for evaluationof expressions and commands. The idea, which we have elaborated in the pa-per, is to consolidate these experiments to systematic test cases, and to allow


repeated execution of the tests when needed. We have described a facility thatorganizes test cases in a repository from which conventional unit tests can bederived automatically. The feeding of test cases into the repository—in terms ofexperimental evaluation of expressions—is done in the read-eval-print loop. Theprocess of collecting test cases does not impose much extra work on the pro-grammer. A little extra work is, however, needed to consolidate the test casesin the direction of systematic testing, to organize the test cases, and to preparefor execution of the test suites (the common setup and the teardown files). Wehypothesize that programmers are willing to do this amount of extra work inreturn for the described benefits.

Conventional test suites, as prepared for unit testing tools such as JUnit,NUnit, CLUnit, and SchemeUnit, do not allow for easy extraction of examplesthat can be used by interface documentation tools. The reason is that test casesare intermingled with the unit testing vocabulary, such as test abstractions andvarious assertions. Following the approach described in the current paper, theexpression and the expected result of a functional test case are represented in aclean manner, which makes it straightforward to aggregate examples.

The interactive unit testing tool has been used for collecting and organizingtest cases during the development of the LAML software package [Nørmark 05].The use of test cases as examples in the documentation of the LAML librariesturns out to be a valuable contribution to the everyday comprehension of thefunctions in the library.

As can be seen, the idea and the solution described in this paper are fairlysimple. Nevertheless, the implications and benefits from exploiting the ideas maybe substantial. We conclude that

1. Lisp programmers are already doing substantial work in the read-eval-printloop, which almost automatically can be turned into test cases.

2. With a little extra organization and support, it is possible to preserve andrepeat the execution of the test cases.

3. Providers of interactive development environments (IDEs) that support read-eval-print loops should consider an implementation of the solution, which issimilar to the approach described in this paper.

The Emacs Lisp program that implements the interactive unit testing tool isbundled with LAML, which is available as free software from the LAML homepage [Nørmark 09].

Acknowledgements

I want to thank the anonymous reviewers for valuable feedback to an earlierversion of this paper.


References

[Adrian 01] Frank A. Adrian. CLUnit - A Unit Test Tool for Common Lisp, 2001.http://www.ancar.org/CLUnit/docs/CLUnit.html.

[Beck 94] Kent Beck. Simple Smalltalk Testing: With Patterns. The SmalltalkReport, vol. 4, no. 2, 1994.

[Beck 98] Kent Beck & Erich Gamma. JUnit Test Infected: Programmers LoveWriting Tests. Java Report, vol. 3, no. 7, pages 51–56, 1998.

[Beck 04] Kent Beck. Extreme programming explained: Embrace change (2ndedition). Addison-Wesley Professional, 2004.

[Doctest 09] Phyton Doctest. doctest. Test interactive Python examples, 2009.http://docs.python.org/library/doctest.html.

[Edwards 04] Jonathan Edwards. Example centric programming. In OOPSLA ’04:Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages124–124, New York, NY, USA, 2004. ACM.

[Edwards 05] Jonathan Edwards. Subtext: uncovering the simplicity of programming.In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN con-ference on Object oriented programming, systems, languages, and ap-plications, pages 505–518, New York, NY, USA, 2005. ACM.

[Findler 02] Robert Bruce Findler, John Clements, Cormac Flanagan, MatthewFlatt, Shriram Krishnamurthi, Paul Steckler & Matthias Felleisen.DrScheme: A Progamming Environment for Scheme. Journal of Func-tional Programming, vol. 12, no. 2, pages 159–182, March 2002.

[Flatt 00] Matthew Flatt. PLT MzScheme: Language Manual. http://www.cs.-rice.edu/CS/PLT/packages/pdf/mzscheme.pdf, August 2000.

[Friendly 95] Lisa Friendly. The Design of Distributed Hyperlinked ProgrammingDocumentation. In Sylvain Frass, Franca Garzotto, Toms Isakowitz,Jocelyne Nanard & Marc Nanard, editors, Proceedings of the Inter-national Workshop on Hypermedia Design (IWHD’95), Montpellier,France, 1995.

[Graham 93] Paul Graham. On lisp. Prentice Hall, 1993.[Hoffman 00] Daniel Hoffman & Paul Strooper. Prose + Test Cases = Specifications.

In TOOLS ’00: Proceedings of the Technology of Object-Oriented Lan-guages and Systems (TOOLS 34’00), page 239, Washington, DC, USA,2000. IEEE Computer Society.

[Hoffman 03] Daniel Hoffman & Paul Strooper. API documentation with executableexamples. The Journal of Systems and Software, vol. 66, no. 2, pages143–156, 2003.

[JUnit 09] JUnit. http://junit.org/, 2009.[Kelsey 98] Richard Kelsey, William Clinger & Jonathan Rees. Revised5 Report on

the Algorithmic Language Scheme. Higher-Order and Symbolic Com-putation, vol. 11, no. 1, pages 7–105, August 1998.

[King 01] Gary King. LIFT - the Lisp framework for testing. Technical report,University of Massachusetts, 2001.

[Leitner 07] Andreas Leitner, Ilinca Ciupa, Manuel Oriol, Bertrand Meyer & Ar-naud Fiva. Contract Driven Development = Test Driven Development- Writing Test-Cases. In Proceedings of the 6th joint meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on the Foundations of Software Engineering (ESEC/FSE2007), September 2007.

[Nørmark 04] Kurt Nørmark. Scheme Program Documentation Tools. In Olin Shiv-ers & Oscar Waddell, editors, Proceedings of the Fifth Workshopon Scheme and Functional Programming, pages 1–11. Department of


Computer Science, Indiana University, September 2004. Technical Re-port 600.

[Nørmark 05] Kurt Nørmark. Web Programming in Scheme with LAML. Journal ofFunctional Programming, vol. 15, no. 1, pages 53–65, January 2005.

[Nørmark 09] Kurt Nørmark. The LAML home page, 2009. http://www.cs.aau.-dk/∼normark/laml/.

[NUnit 09] NUnit. http://www.nunit.org, 2009.[Rivas 09] Juan M. Bello Rivas. (incf cl) utilities, November 2009. http://-

superadditive.com/projects/incf-cl/.[Stefil 09] Stefil. Stefil - Simple test framework in Lisp, 2009. http://-

common-lisp.net/project/stefil/.[van Heesch 04] Dimitri van Heesch. Doxygen, 2004. http://www.doxygen.org.[Waters 91] Richard C. Waters. Supporting the regression testing in Lisp programs.

SIGPLAN Lisp Pointers, vol. IV, no. 2, pages 47–53, 1991.[Welsh 02] Noel Welsh, Francisco Solsona & Ian Glover. SchemeUnit and

SchemeQL: Two Little Languages. In Third Workshop and Schemeand Functional Programming, October 2002.


systematic unit testing in a read-eval-print loop€¦ · contrast to systematic unit testing [beck...

Documents