grammar-based testing revisited

35
Grammar-based Testing Revisited Ralf Lämmel Software Languages Team University of Koblenz-Landau Campus Koblenz Acknowledgement: Most of the presented results are based on (not so recent) collaboration with Wolfram Schulte (MSR, Redmond, WA, USA) and Jörg Harm (akquinet AG, Hamburg, Germany)

Upload: others

Post on 18-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grammar-based Testing Revisited

Grammar-based Testing Revisited

Ralf LämmelSoftware Languages Team

University of Koblenz-LandauCampus Koblenz

Acknowledgement: Most of the presented results are based on (not so recent) collaboration with Wolfram Schulte (MSR, Redmond, WA, USA) and Jörg Harm (akquinet AG, Hamburg, Germany)

Page 2: Grammar-based Testing Revisited

Different Kinds of Systems Need Different Test Technology

• Protocols/Control Systems• Distributed Sys.• OO Software• XML/Grammarware

→ FSM/Transition Sys/…→ Interface Automata/…→ Object Models/…→ Grammars/AGs/…

Page 3: Grammar-based Testing Revisited

Grammarware

3

8 R. LAMMEL, C. VERHOEF

effective grammar recovery methodology for those languages is of significant importance to the entireIT industry. And how to obtain them is enabled by grammar (re)engineering—the subject of this paper.

Organization The rest of this paper is organized as follows. We start with the grammar life-cycle thatcomprises the artifacts containing grammars and their connections, followed by the case study. Thisstudy shows how to recover a complete and correct specification of IBM’s VS COBOL II grammar.Based on our experience with this and other cases studies, we suggest a structured process to obtaina grammar specification and to derive a realistic parser. We briefly discuss possible platforms forimplementing grammar engineering tools, to emphasize that with virtually all compiler compilers,or language design and processing environments you can implement grammar engineering tools.

THE GRAMMAR LIFE-CYCLE

As illustrated in the introductory section, dealing with software implies dealing with grammars. Thecurrent state-of-the-art in software engineering is that grammars for different purposes are not relatedto each other. In an ideal situation, all the grammars can be inferred from some base-line grammar. Weare not in this ideal situation. With grammar recovery we can enable the grammar life-cycle and deliverthe missing grammars in a cost-effective manner so that urgent code modification tasks can rapidly beimplemented with tools based on the recovered grammars.Before discussing the grammar life-cycle we should make clear what its components are. The

following artifacts have proved to be useful during the life-cycle of software [56]:

compilers,debuggers,animators,profilers,pretty printers,language reference manuals,language browsers,software analysis tools,code preprocessing tools,software modification tools,test-set generation tools,software testing tools, etc.

The Grammar Gamut

There are many grammars that we use day in and out, often without realizing it. We mention the mostprominent grammars and the tools they reside in, and we discuss similarities and differences.

Copyright c 2001 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2001; 12:1–6Prepared using speauth.cls

Obviously, all this stuff needs to be tested,

right!?

Page 4: Grammar-based Testing Revisited

The speakers’s access path to grammar-based testing: grammar recovery

4

Stealing from reference manualsSome of our colleagues felt a little fooled

by the PLEX result: “You are not really con-structing a parser; you only converted an ex-isting one. We can do that, too. Now try itwithout the compiler.’’ Indeed, at first sight,not having this valuable knowledge sourceavailable seemed to make the work more dif-ficult. After all, an earlier effort to recoverthe PLEX grammar from various onlinemanuals had failed: they were not goodenough for reconstructing the language.17

Later, we discovered that the manuals lackedover half of the language definition, so thatthe recovery process had to be incomplete bydefinition. We also found that our failurewas due not to our tools but to the nature ofproprietary manuals: if the language’s audi-ence is limited, major omissions can go un-noticed for a long time. When there is a largecustomer base, the language vendor has todeliver better quality.

In another two-week effort,5 we recoveredthe VS Cobol II grammar from IBM’s manualVS COBOL II Reference Summary, version1.2. (For the fully recovered VS Cobol II gram-mar, see www.cs.vu.nl/grammars/vs-cobol-ii.)Again, the process was straightforward:

1. Retrieve the online VS Cobol II manualfrom www.ibm.com.

2. Extract its syntax diagrams.3. Write a parser for the syntax diagrams.4. Extract the BNF from the diagrams.5. Add 17 lexical rules by hand.6. Correct the BNF using grammar trans-

formations.7. Generate an error-detection parser.8. Incrementally parse 2 million lines of

VS Cobol II code.9. Reiterate steps 6 through 8 until all errors

vanish.10. Convert the BNF to SDF.11. Generate a production parser.

8 6 I E E E S O F T W A R E N o v e m b e r / D e c e m b e r 2 0 0 1

3.30 SEARCH Statement

___ Format 1--Serial Search _____________________________________ | | | >>__SEARCH__identifier-1______________________________________> | | |_VARYING____identifier-2___| | | |_index-name-1_| | | | | >_____________________________________________________________> | | |_________END__imperative-statement-1_| | | |_AT_| | | | | <_______________________________________________ | | >____WHEN__condition-1____imperative-statement-2___|__________> | | |_NEXT-SENTENCE__________| | | | | >____________________________________________________________>< | | | (1)| | | |_END-SEARCH____| | | | | Note: | x | (1) END-SEARCH with NEXT SENTENCE is an IBM extension. | |_________________________________________________________________|

search-statement =

”SEARCH” identifier [”VARYING” (identifier | index-name)]

[[”AT”] ”END” statement-list]

{”WHEN” condition (statement-list | ”NEXT” ”SENTENCE”)}+

[”END-SEARCH”]

(a)

(b)

Figure 5. (a) The original syntax diagram for the Search statement; (b) thesame diagram after conversion to BNF and correction.

Page 5: Grammar-based Testing Revisited

5

How to test grammarware?

OracleComparator

Pass/Fail

Input

LanguageProcessor A

Data Generator

LanguageProcessor B

Inverter

Differential Testing

Robustness Testing

Identity Testing

Page 6: Grammar-based Testing Revisited

Purdom’s seminal work• Input: CFG G• Output:

Small set of short words from L(G)such that each production of G is used at least once

6

TWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 3

sufficient. The other dimension of AGs, that isthe attributes with conditions and computationsalso need to be taken into account. Actually, rulecoverage is not even sufficient in the syntactic di-mension. We also will comment on negative testcases, especially in the semantic dimension.

2.1 Preliminaries

We assume basic knowledge of context-free gram-mar theory and attribute grammars as covered bysurveys like [28, 1, 25, 20]. For convenience, someelementary terminology is provided in the sequel.

A context-free grammar G is a quadruple〈N, T, s, P 〉 as usual, i.e., N and T are the dis-joint finite sets of nonterminals resp. terminals.s ∈ N is called start symbol. P is a finiteset of productions or (context-free) rules withP ⊂ N × (N ∪ T )!. We resort to the commonnotation l → r for a production 〈l, r〉 ∈ P withl ∈ N and r ∈ (N∪T )!. For simplicity, we assumereduced and terminated context-free grammars inthe sequel.

An attribute grammar AG is a quadruple〈G, A, CM, CN〉, where G is the underlyingcontext-free grammar, A associates each x ∈N ∪ T with finite sets of synthesized attributesAs(x) and inherited attributes Ai(x), CM andCN associate each production p of G with fi-nite sets of computations CM(p) and conditionsCN(p). We assume well-formed, non-cyclic at-tribute grammars in normal form. Given a pro-duction p = x0 → x1 · · ·xm ∈ P , with x0 ∈ N ,x1, . . . , xm ∈ N∪T , a computation c from CM(p)is of the form r0.a0 := fc(r1.a1, . . . , rk.ak) where0 ≤ rj ≤ m, and xrj carries an attribute aj forj = 0, . . . , k; similar for conditions.

2.2 Rule coverage

Let us motivate rule coverage by the following testscenario. We want to test an acceptor A which issupposed to accept some language L(G) gener-ated by a context-free grammar G. Later we gen-eralise this scenario from context-free grammarsto AGs. We take some finite set TS ⊆ L(G) andcheck if A accepts each w ∈ TS . We want to gainconfidence that the language accepted by A ac-tually is L(G). Thus, we have to ensure that TSexperiences to a certain degree all aspects of L(G).Actually, TS should cover G to some extent. The

bare minimum of coverage is to require that ev-ery production of G is applied in the derivationof some w ∈ TS .

[prog] Prog → Block .[block] Block → Decls begin Stms end[nodecl] Decls → ε[decls] Decls → Decl Decls[decl] Decl → label id ;[onestm] Stms → Stm[stms] Stms → Stm ; Stms[skip] Stm → ε[goto] Stm → goto id[ldef] Stm → id : Stm[if] Stm → if Exp then Stm[localb] Stm → Block[true] Exp → true...

Figure 1: Productions for a Pascal-like language

Figure 1 shows an excerpt of a context-freegrammar for a Pascal-like programming language.In the derivation of the program

label a;begina : goto a; begin if true then skip endend.

all productions of the context-free grammar ofFigure 1 are used.

If A is assumed to implement an AG ratherthan just a context-free grammar, the above sce-nario needs to be refined. The conditions and par-tial computations of an AG usually enforce thatthe language generated by the AG is only a subsetof L(G) where G is the underlying context-freegrammar. Thus, we should preferably consideronly semantically correct programs in the test setTS . For any decent AG, rule coverage should re-main feasible. Moreover, if the aim is just to testthe context-free parsing aspect of A w.r.t. the ref-erence grammar G, we can even consider possi-bly semantically incorrect test programs. From apractical perspective, we only had to be able toseparate syntactic and semantic errors while ap-plying A to TS .

2.3 Beyond rule coverage

Rule coverage is by far too simple. More com-plex criteria than simple rule coverage are sen-sible to enforce certain kinds of combinations of

TWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 3

sufficient. The other dimension of AGs, that isthe attributes with conditions and computationsalso need to be taken into account. Actually, rulecoverage is not even sufficient in the syntactic di-mension. We also will comment on negative testcases, especially in the semantic dimension.

2.1 Preliminaries

We assume basic knowledge of context-free gram-mar theory and attribute grammars as covered bysurveys like [28, 1, 25, 20]. For convenience, someelementary terminology is provided in the sequel.

A context-free grammar G is a quadruple〈N, T, s, P 〉 as usual, i.e., N and T are the dis-joint finite sets of nonterminals resp. terminals.s ∈ N is called start symbol. P is a finiteset of productions or (context-free) rules withP ⊂ N × (N ∪ T )!. We resort to the commonnotation l → r for a production 〈l, r〉 ∈ P withl ∈ N and r ∈ (N∪T )!. For simplicity, we assumereduced and terminated context-free grammars inthe sequel.

An attribute grammar AG is a quadruple〈G, A, CM, CN〉, where G is the underlyingcontext-free grammar, A associates each x ∈N ∪ T with finite sets of synthesized attributesAs(x) and inherited attributes Ai(x), CM andCN associate each production p of G with fi-nite sets of computations CM(p) and conditionsCN(p). We assume well-formed, non-cyclic at-tribute grammars in normal form. Given a pro-duction p = x0 → x1 · · ·xm ∈ P , with x0 ∈ N ,x1, . . . , xm ∈ N∪T , a computation c from CM(p)is of the form r0.a0 := fc(r1.a1, . . . , rk.ak) where0 ≤ rj ≤ m, and xrj carries an attribute aj forj = 0, . . . , k; similar for conditions.

2.2 Rule coverage

Let us motivate rule coverage by the following testscenario. We want to test an acceptor A which issupposed to accept some language L(G) gener-ated by a context-free grammar G. Later we gen-eralise this scenario from context-free grammarsto AGs. We take some finite set TS ⊆ L(G) andcheck if A accepts each w ∈ TS . We want to gainconfidence that the language accepted by A ac-tually is L(G). Thus, we have to ensure that TSexperiences to a certain degree all aspects of L(G).Actually, TS should cover G to some extent. The

bare minimum of coverage is to require that ev-ery production of G is applied in the derivationof some w ∈ TS .

[prog] Prog → Block .[block] Block → Decls begin Stms end[nodecl] Decls → ε[decls] Decls → Decl Decls[decl] Decl → label id ;[onestm] Stms → Stm[stms] Stms → Stm ; Stms[skip] Stm → ε[goto] Stm → goto id[ldef] Stm → id : Stm[if] Stm → if Exp then Stm[localb] Stm → Block[true] Exp → true...

Figure 1: Productions for a Pascal-like language

Figure 1 shows an excerpt of a context-freegrammar for a Pascal-like programming language.In the derivation of the program

label a;begina : goto a; begin if true then skip endend.

all productions of the context-free grammar ofFigure 1 are used.

If A is assumed to implement an AG ratherthan just a context-free grammar, the above sce-nario needs to be refined. The conditions and par-tial computations of an AG usually enforce thatthe language generated by the AG is only a subsetof L(G) where G is the underlying context-freegrammar. Thus, we should preferably consideronly semantically correct programs in the test setTS . For any decent AG, rule coverage should re-main feasible. Moreover, if the aim is just to testthe context-free parsing aspect of A w.r.t. the ref-erence grammar G, we can even consider possi-bly semantically incorrect test programs. From apractical perspective, we only had to be able toseparate syntactic and semantic errors while ap-plying A to TS .

2.3 Beyond rule coverage

Rule coverage is by far too simple. More com-plex criteria than simple rule coverage are sen-sible to enforce certain kinds of combinations of

aka rule coverage

Page 7: Grammar-based Testing Revisited

In need of other coverage criteria

• Typical objectives for test-data suites:

– larger suite of smaller examples

–very large suite

–very complex test samples

–take into account syntactical context

–take into account semantical aspects

7

Page 8: Grammar-based Testing Revisited

Work done by the speaker

• w/ Wolfram Schulte: Controllable Combinatorial Coverage in Grammar-Based Testing. TestCom 2006

• Grammar Testing. FASE 2001

• w/ Jörg Harm: Test case characterisation by regular path expressions. FATES 2001

• w/ Jörg Harm: Two-dimensional Approximation Coverage. WAGA 2000 and Informatica (Slovenia) 24(3). 2000.

8

Page 9: Grammar-based Testing Revisited

9

Geno Tool

Geno is a test-data generator that generates grammar terms up to a given depth

• Ideal case: full combinatorial coverage of the grammar• However test-data sets are explosive most of the time.• Introduce control mechanism for combinatorial coverage.

Abstract grammar

Geno(C# library)

Terms(C# objects)

Work done during 2004-2006 while at Microsoft MSFT and MSR.

Page 10: Grammar-based Testing Revisited

10

Input: Abstract Grammars

Example for input grammar:

Exp = BinExp ( Exp , BOp, Exp ) // Binary expressions | UnaExp ( UOp , Exp ) // Unary expressions | LitExp ( Int ) ; // Literals as expressionsBOp = "+" ; // A binary operator UOp = "-" ; // A unary operatorInt = "1" ; // An integer literal

• An abstract grammar is made up of a sequence of sort defs.• Sort definitions contain a non empty sequence of constructor defs.• Each constructor contains a sequence of argument sort defs.• Also supports sequence sorts defined as Args = (Exp*)

Page 11: Grammar-based Testing Revisited

11

Output: Abstract Grammar Terms

• Has sequence number 25 (It is the 25th term that is generated)

• Has depth 4• Contains 3 recursive unfoldings

of expressions.

Depth 4

Depth 3

Depth 2

Depth 1

Una-25<4>:3("-", Una-7<3>:2("-", Lit-1<2>:1("1")))

Example for output term:

Page 12: Grammar-based Testing Revisited

12

Combinatorial Explosion!

Depth Ga Gb

1 integer literal (1)1 unary operator (‘-’)1 binary operator (‘+’)

3 integer literals (0,1,2)2 unary operators (‘+’, ‘-’)

4 binary operators (‘+’, ‘-’, ‘*’, ‘/’)

1 0 02 1 33 2 424 10 8,1485 170 268,509,1926 33,490 (outside the long integer range)7 (outside the long integer range) (outside the long integer range)

Page 13: Grammar-based Testing Revisited

13

Controlling Combinatorial Explosion• Use of Small Scope Hypothesis [Jackson et al]

– Errors found within small scope, e.g., small object conglomerations– Construct systematically (in a bottom-up manner)

• Use of Regularity Hypothesis [Gaudel et al]– Test model and implementation up to certain data depths– Assume “regular” implementation– Control recursion depths

• Use of Pairwise Hypothesis [ATG]– Error found by pair-wise exploration of arguments– Avoid exponential explosion

Page 14: Grammar-based Testing Revisited

14

The Base Algorithm1. Compute sort graph

which defines when sorts and terms are productive

2. Compute terms bottom-up using sort graph

public void Do() {for (int depth = 1; depth <= env.RootMaxDepth; depth++) foreach (sort in env.Sorts) if (IsSortProductive(sort, depth)) foreach (cons in env.Constructor[s]) { int[] argDepths = IsConstructorProductive(sort, depth, cons); if (argDepths != null) AddTerms(sort, depth, cons, argDepths); }

}

3. Refine IsSortProductive, IsConstructorProductive and AddTerms for control mechanisms

Page 15: Grammar-based Testing Revisited

15

Control Mechanisms• Depth Control (typically used for root)

–Limit all constructor applications • Recursion Control (depth per recursive sort)

–Limit recursive constructor applications• Balance Control

–Limit depth variation for argument terms• Combination Control

–Limit combinatorial composition on argument terms• Context Control

–Enforce Context Conditions

Page 16: Grammar-based Testing Revisited

16

Testing the .NET Loader as of 2005/06

Page 17: Grammar-based Testing Revisited

17

Type Grammar

Grammar Control

Page 18: Grammar-based Testing Revisited

18

/* Sample Grammar for sort “typeDef” */

typeDef [MaxDepth = 10]

= Class (classDef)

| Valuetype (valuetypeDef)

| Interface (interfaceDef)

| Enum (enumDef)

| Delegate(delegateDef);

classDef

= (classHead, classMembers);

/*similar for valueTypeDef, …*/

classHead

= (typeAttr, typeName,

extendsClause, implClause); /*similar for valueTypeDef, …*/

classMembers = (nestedTypeDefs, fieldDefs, methodDefs);

nestedTypeDefs = [MinLength=0, MaxLength=3, Pairwise] (nestedTypeDef*) ;

nestedTypeDef [MaxRecDepth=2]

= NestedClass (nestedClassDef)

| NestedValuetype(nestedValuetypeDef)

| NestedInterface(nestedInterfaceDef)

| NestedEnum (nestedEnumDef)

| NestedDelegate (nestedDelegateDef);

Page 19: Grammar-based Testing Revisited

19

Algorithm for Creating Loader Data PASS 1 – Create Cache of ObjectsGenerate Field Objects, Method Objects (use placeholder for Type objects since

real Type objects do not exist at this point)Generate Type Objects and randomly pick Field and Method Objects from

cache to populate FIELDDEF, METHODDEF placeholders

PASS 2 – Do “Fixups” Iterate over Field, Method Objects and randomly pick Type Objects from cache

to populate CUSTOMTYPE placeholdersIterate over Type Objects and randomly pick Type Objects from cache to

populate CUSTOMTYPE placeholders for implements/extends relations

PASS 3 – Transitive ClosureCompute the transitive closure of all Type Objects that are needed by the

current one. Also if implementing an interface, add the interface methods.

PASS 4 – Emit Emit transitive closure IL for each top-level Type Object. Include all

dependencies in assembly IL. Also emit drivers.

Page 20: Grammar-based Testing Revisited

20

Geno ExperiencesApplication Oracle Bugs found

Class /XML Serialization* Serialize/Deserialize

60% of all errors were found by Geno

.NET Loader+ Version 1.1/Version 2.0

Serious issues found

.NET Generics+ VB/C#

Yes, …

XPath+ Optimized/Non-optimized

Yes,…

WS Security Policy $ Microsoft Yes,…

* Identity testing

$Robustness testing+ Differential testing

as of 2005/06

Page 21: Grammar-based Testing Revisited

21

Ideas not (yet) leveraged by Geno

For (attribute) grammars

• weight based control [e.g., Maurer, Slutz, Sirer at al]

• regular path specifications [e.g., Harm and Lämmel]

• attribute coverage [e.g., Harm and Lämmel]

For data types

• weight based [e.g., for Haskell: Classen et al]

• invariant field access based [e.g., for Java: Marinov et al]

Page 22: Grammar-based Testing Revisited

Weight-based control

main: 1: abc, 2:def, 3:ghi, 4:jkl;

Page 23: Grammar-based Testing Revisited

In need of weights

11

7. A Word of Warning

The example of the preceding section contains something that I have been careful toavoid in all other examples: a recursive grammar. There is a hidden danger in usingrecursive grammars to generate data. This danger is of a theoretical nature, not animplementation problem (dgl likes recursive grammars just fine). To illustrate, consider therecursive grammar of Figure 8.

exp: %{exp}+%{exp},%{exp}*%{exp},(%{exp}),%{variable};

variable: a;

Figure 8. A Recursive Grammar.

The four alternatives of the production "exp" are chosen with equal probability. Whena nonterminal of the form %{exp} is replaced, the replacement has a 50% chance of havingtwo copies of %{exp}, both of which must eventually be replaced. It has a 25% chance ofcontaining one copy and a 25% chance of containing no copies of %{exp}. Now, thinkabout the string that the data generator is currently expanding, and count only theoccurrences of %{exp}. The string has a 50% chance of getting bigger, a 25% chance ofstaying the same size and only a 25% chance of getting smaller. The data-generator willnot stop until the size is zero. In short, your test might be infinitely long. To get thisgrammar to work, you must weight the fourth alternative so that the probability of gettingsmaller is greater than the probability of getting larger. If you use this rule of thumb withall of your recursive grammars you should stay out of trouble.

For a complete theoretical analysis of this phenomenon, see reference [4].

8. Conclusion

I have found the enhanced context free grammars described in this paper to be veryeffective tools for generating test data of many different kinds. I have used these grammarsto debug many different programs with great success. Due to the large volume of tests thatcan be generated from a test grammar, it is usually necessary to devise some automaticmethod for predicting the outcome of a test and diagnosing the results once it has been run.For some tests, action routines can be used to compute the expected outcome, but aseparate program is usually needed to compare the actual result with the predicted result andreport discrepancies.

Size of strings restricted to exp:* 50 % probability of getting bigger.* 25 % probability of staying the same.* 25 % probability of getting smaller.

Page 24: Grammar-based Testing Revisited

Test-data characterization by regular path expressions

✦ Associate grammar (signature) with path grammar

✦ Specify test-data with path expressions

✦ Specify coverage criteria as sets of path expressions

✦ Use automata theory for analysis (and generation)

Page 25: Grammar-based Testing Revisited

Associate grammar (signature) with path grammar

Page 26: Grammar-based Testing Revisited

Terms vs. paths

26

Page 27: Grammar-based Testing Revisited

Link to regular language theory

27

0

3 2 1

4 5 7

6

lvar

lambda

apply

1

succ

zero

3

1

2

tvar

arrow

{1, 2}

{1, 2}

Page 28: Grammar-based Testing Revisited

Test-data characterization

28

Page 29: Grammar-based Testing Revisited

Acceptor for test data

29

0

9

10 8 7 6

3 2 1

4 5 11

lambda

apply

3

{1, 2}

lambda

apply {1, 2}

apply

lvar

lambda

1

succ

zero

3

1

2

tvar

arrow

{1, 2}

Page 30: Grammar-based Testing Revisited

Example of coverage criterion:use all (symbols, i.e., “rules”) in all contexts

30

Coverage criteria are sets of paths expressions that need to meet some

additional sanity properties.

Page 31: Grammar-based Testing Revisited

Two dimensional-coverageTWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 5

Type definitions:ID = {a, ..., z}+

ID LIST = [ ] + [ID|ID LIST ]

Attributes:Ai(Block) = Ai(Stms) = Ai(Stm) = {TL}As(Stms) = As(Stm) = {DL, LTL}As(id) = {Name}As(Decl) = {LN}As(Decls) = {L}

Attribute types:Name : ID name of the identifierLN : ID name of the declared labelL : ID LIST list of declared labelsTL : ID LIST target labels reachable from

inside the block, statement list,or statement

DL : ID LIST labels with a defining occurrenceinside the statements of a block

LTL : ID LIST labels with defining occurrenceinside the statement list orstatement which are reachableby goto statements on the samestatement nesting level

Figure 2: Attributes for checking jumps

greater than one where at each position at leasttwo different ID values occur. Two sensible testsets are the following:

– {[ ], [a|[ ]], [a|[b|[ ]]], [b|[a|[ ]]]}

– {[ ], [a|[ ]], [a|[a|[ ]]], [b|[b|[ ]]]}

Now, we can say a subset TS of the languagedefined by an attribute grammar AG covers anattribute of a production of AG, if the values as-sociated with this attribute in the derivation treesof the elements of TS cover the domain of the at-tribute.

While the coverage notion for context-freegrammars forces the application of grammar rulesin meaningful syntactic contexts, the domain cov-erage forces meaningful semantic contexts. Ap-plied to the domain equation of Figure 2 and tothe nonterminal Stm, for example, the domaincoverage criterion enforces the use of the variousalternatives of Stm in different contexts coveringits attributes TL, DL, and LTL.

For both the syntactic and the semantic dimen-sion, full coverage often cannot be achieved for

[prog] Prog → Block .1.TL := [ ][block] Block → Decls begin Stms end3.TL := (0.TL \ 1.L) ∪ 3.LTL3.DL = 1.L[nodecl] Decls → ε0.L := [ ][decls] Decls → Decl Decls0.L := [1.LN|2.L]1.LN /∈ 2.L[decl] Decl → label id ;0.LN := 2.Name[onestm] Stms → Stm1.TL := 0.TL0.DL := 1.DL0.LTL := 1.LTL[stms] Stms → Stm ; Stms1.TL := 0.TL3.TL := 0.TL0.DL := 1.DL ∪ 3.DL0.LTL := 1.LTL ∪ 3.LTL1.DL ∩ 3.DL = [ ][skip] Stm → ε0.DL := [ ]0.LTL := [ ][goto] Stm → goto id0.DL := [ ]0.LTL := [ ]2.Name ∈ 0.TL[ldef] Stm → id : Stm3.TL := 0.TL0.DL := [1.Name|3.DL]0.LTL := [1.Name|3.LTL]1.Name /∈ 3.DL[if] Stm → if ... then Stm4.TL := 0.TL ∪ 4.LTL0.DL := 4.DL0.LTL := [ ][localb] Stm → Block1.TL := 0.TL0.DL := [ ]0.LTL := [ ][true] Exp → true...

Figure 3: AG for checking jumps

a given AG specification because of two relatedproblems:

– Full coverage for the context-free grammarmay not be achieved because the conditionson the attributes rule out some syntacticcombinations.

– Full coverage may not be achieved for some

TWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 3

sufficient. The other dimension of AGs, that isthe attributes with conditions and computationsalso need to be taken into account. Actually, rulecoverage is not even sufficient in the syntactic di-mension. We also will comment on negative testcases, especially in the semantic dimension.

2.1 Preliminaries

We assume basic knowledge of context-free gram-mar theory and attribute grammars as covered bysurveys like [28, 1, 25, 20]. For convenience, someelementary terminology is provided in the sequel.

A context-free grammar G is a quadruple〈N, T, s, P 〉 as usual, i.e., N and T are the dis-joint finite sets of nonterminals resp. terminals.s ∈ N is called start symbol. P is a finiteset of productions or (context-free) rules withP ⊂ N × (N ∪ T )!. We resort to the commonnotation l → r for a production 〈l, r〉 ∈ P withl ∈ N and r ∈ (N∪T )!. For simplicity, we assumereduced and terminated context-free grammars inthe sequel.

An attribute grammar AG is a quadruple〈G, A, CM, CN〉, where G is the underlyingcontext-free grammar, A associates each x ∈N ∪ T with finite sets of synthesized attributesAs(x) and inherited attributes Ai(x), CM andCN associate each production p of G with fi-nite sets of computations CM(p) and conditionsCN(p). We assume well-formed, non-cyclic at-tribute grammars in normal form. Given a pro-duction p = x0 → x1 · · ·xm ∈ P , with x0 ∈ N ,x1, . . . , xm ∈ N∪T , a computation c from CM(p)is of the form r0.a0 := fc(r1.a1, . . . , rk.ak) where0 ≤ rj ≤ m, and xrj carries an attribute aj forj = 0, . . . , k; similar for conditions.

2.2 Rule coverage

Let us motivate rule coverage by the following testscenario. We want to test an acceptor A which issupposed to accept some language L(G) gener-ated by a context-free grammar G. Later we gen-eralise this scenario from context-free grammarsto AGs. We take some finite set TS ⊆ L(G) andcheck if A accepts each w ∈ TS . We want to gainconfidence that the language accepted by A ac-tually is L(G). Thus, we have to ensure that TSexperiences to a certain degree all aspects of L(G).Actually, TS should cover G to some extent. The

bare minimum of coverage is to require that ev-ery production of G is applied in the derivationof some w ∈ TS .

[prog] Prog → Block .[block] Block → Decls begin Stms end[nodecl] Decls → ε[decls] Decls → Decl Decls[decl] Decl → label id ;[onestm] Stms → Stm[stms] Stms → Stm ; Stms[skip] Stm → ε[goto] Stm → goto id[ldef] Stm → id : Stm[if] Stm → if Exp then Stm[localb] Stm → Block[true] Exp → true...

Figure 1: Productions for a Pascal-like language

Figure 1 shows an excerpt of a context-freegrammar for a Pascal-like programming language.In the derivation of the program

label a;begina : goto a; begin if true then skip endend.

all productions of the context-free grammar ofFigure 1 are used.

If A is assumed to implement an AG ratherthan just a context-free grammar, the above sce-nario needs to be refined. The conditions and par-tial computations of an AG usually enforce thatthe language generated by the AG is only a subsetof L(G) where G is the underlying context-freegrammar. Thus, we should preferably consideronly semantically correct programs in the test setTS . For any decent AG, rule coverage should re-main feasible. Moreover, if the aim is just to testthe context-free parsing aspect of A w.r.t. the ref-erence grammar G, we can even consider possi-bly semantically incorrect test programs. From apractical perspective, we only had to be able toseparate syntactic and semantic errors while ap-plying A to TS .

2.3 Beyond rule coverage

Rule coverage is by far too simple. More com-plex criteria than simple rule coverage are sen-sible to enforce certain kinds of combinations of

Page 32: Grammar-based Testing Revisited

Two dimensional-coverageTWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 5

Type definitions:ID = {a, ..., z}+

ID LIST = [ ] + [ID|ID LIST ]

Attributes:Ai(Block) = Ai(Stms) = Ai(Stm) = {TL}As(Stms) = As(Stm) = {DL, LTL}As(id) = {Name}As(Decl) = {LN}As(Decls) = {L}

Attribute types:Name : ID name of the identifierLN : ID name of the declared labelL : ID LIST list of declared labelsTL : ID LIST target labels reachable from

inside the block, statement list,or statement

DL : ID LIST labels with a defining occurrenceinside the statements of a block

LTL : ID LIST labels with defining occurrenceinside the statement list orstatement which are reachableby goto statements on the samestatement nesting level

Figure 2: Attributes for checking jumps

greater than one where at each position at leasttwo different ID values occur. Two sensible testsets are the following:

– {[ ], [a|[ ]], [a|[b|[ ]]], [b|[a|[ ]]]}

– {[ ], [a|[ ]], [a|[a|[ ]]], [b|[b|[ ]]]}

Now, we can say a subset TS of the languagedefined by an attribute grammar AG covers anattribute of a production of AG, if the values as-sociated with this attribute in the derivation treesof the elements of TS cover the domain of the at-tribute.

While the coverage notion for context-freegrammars forces the application of grammar rulesin meaningful syntactic contexts, the domain cov-erage forces meaningful semantic contexts. Ap-plied to the domain equation of Figure 2 and tothe nonterminal Stm, for example, the domaincoverage criterion enforces the use of the variousalternatives of Stm in different contexts coveringits attributes TL, DL, and LTL.

For both the syntactic and the semantic dimen-sion, full coverage often cannot be achieved for

[prog] Prog → Block .1.TL := [ ][block] Block → Decls begin Stms end3.TL := (0.TL \ 1.L) ∪ 3.LTL3.DL = 1.L[nodecl] Decls → ε0.L := [ ][decls] Decls → Decl Decls0.L := [1.LN|2.L]1.LN /∈ 2.L[decl] Decl → label id ;0.LN := 2.Name[onestm] Stms → Stm1.TL := 0.TL0.DL := 1.DL0.LTL := 1.LTL[stms] Stms → Stm ; Stms1.TL := 0.TL3.TL := 0.TL0.DL := 1.DL ∪ 3.DL0.LTL := 1.LTL ∪ 3.LTL1.DL ∩ 3.DL = [ ][skip] Stm → ε0.DL := [ ]0.LTL := [ ][goto] Stm → goto id0.DL := [ ]0.LTL := [ ]2.Name ∈ 0.TL[ldef] Stm → id : Stm3.TL := 0.TL0.DL := [1.Name|3.DL]0.LTL := [1.Name|3.LTL]1.Name /∈ 3.DL[if] Stm → if ... then Stm4.TL := 0.TL ∪ 4.LTL0.DL := 4.DL0.LTL := [ ][localb] Stm → Block1.TL := 0.TL0.DL := [ ]0.LTL := [ ][true] Exp → true...

Figure 3: AG for checking jumps

a given AG specification because of two relatedproblems:

– Full coverage for the context-free grammarmay not be achieved because the conditionson the attributes rule out some syntacticcombinations.

– Full coverage may not be achieved for some

TWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 5

Type definitions:ID = {a, ..., z}+

ID LIST = [ ] + [ID|ID LIST ]

Attributes:Ai(Block) = Ai(Stms) = Ai(Stm) = {TL}As(Stms) = As(Stm) = {DL, LTL}As(id) = {Name}As(Decl) = {LN}As(Decls) = {L}

Attribute types:Name : ID name of the identifierLN : ID name of the declared labelL : ID LIST list of declared labelsTL : ID LIST target labels reachable from

inside the block, statement list,or statement

DL : ID LIST labels with a defining occurrenceinside the statements of a block

LTL : ID LIST labels with defining occurrenceinside the statement list orstatement which are reachableby goto statements on the samestatement nesting level

Figure 2: Attributes for checking jumps

greater than one where at each position at leasttwo different ID values occur. Two sensible testsets are the following:

– {[ ], [a|[ ]], [a|[b|[ ]]], [b|[a|[ ]]]}

– {[ ], [a|[ ]], [a|[a|[ ]]], [b|[b|[ ]]]}

Now, we can say a subset TS of the languagedefined by an attribute grammar AG covers anattribute of a production of AG, if the values as-sociated with this attribute in the derivation treesof the elements of TS cover the domain of the at-tribute.

While the coverage notion for context-freegrammars forces the application of grammar rulesin meaningful syntactic contexts, the domain cov-erage forces meaningful semantic contexts. Ap-plied to the domain equation of Figure 2 and tothe nonterminal Stm, for example, the domaincoverage criterion enforces the use of the variousalternatives of Stm in different contexts coveringits attributes TL, DL, and LTL.

For both the syntactic and the semantic dimen-sion, full coverage often cannot be achieved for

[prog] Prog → Block .1.TL := [ ][block] Block → Decls begin Stms end3.TL := (0.TL \ 1.L) ∪ 3.LTL3.DL = 1.L[nodecl] Decls → ε0.L := [ ][decls] Decls → Decl Decls0.L := [1.LN|2.L]1.LN /∈ 2.L[decl] Decl → label id ;0.LN := 2.Name[onestm] Stms → Stm1.TL := 0.TL0.DL := 1.DL0.LTL := 1.LTL[stms] Stms → Stm ; Stms1.TL := 0.TL3.TL := 0.TL0.DL := 1.DL ∪ 3.DL0.LTL := 1.LTL ∪ 3.LTL1.DL ∩ 3.DL = [ ][skip] Stm → ε0.DL := [ ]0.LTL := [ ][goto] Stm → goto id0.DL := [ ]0.LTL := [ ]2.Name ∈ 0.TL[ldef] Stm → id : Stm3.TL := 0.TL0.DL := [1.Name|3.DL]0.LTL := [1.Name|3.LTL]1.Name /∈ 3.DL[if] Stm → if ... then Stm4.TL := 0.TL ∪ 4.LTL0.DL := 4.DL0.LTL := [ ][localb] Stm → Block1.TL := 0.TL0.DL := [ ]0.LTL := [ ][true] Exp → true...

Figure 3: AG for checking jumps

a given AG specification because of two relatedproblems:

– Full coverage for the context-free grammarmay not be achieved because the conditionson the attributes rule out some syntacticcombinations.

– Full coverage may not be achieved for some

Page 33: Grammar-based Testing Revisited

A generated test-data set

TWO-DIMENSIONAL APPROXIMATION COVERAGE Informatica 17 page xxx–yyy 11

each attribute to a certain extent. If p1, . . . , pm

are the productions of G defining the nonterminaln, then we can encode decorations of the variousproductions for n as a sum. This is modelled bythe concrete domain is nD = pD

1 + · · · + pDm.

The corresponding semantic abstract domain nD

for n is defined as nD = p1D + · · · + pm

D. Thetwo-level approach to the definition of nD enforcesthat semantic coverage of non-terminals separatesthe different occurrences of n in the various rules.A relaxed definition of nD could also be conceived.Finally, we define the combined abstract domainnAG = nG × nD.

It remains to define the abstraction functionfor attribute grammars. Here the considerationof subtrees turns out to be essential. Given a dec-orated derivation tree dt ∈ DT AG, and a nonter-minal n, the corresponding abstract value w.r.t.n is denoted by dt

n. It is a pair of values for ab-stract syntactic coverage and abstract semanticcoverage defined as follows:

dtn = 〈syn, sem〉

syn =⊔

dt′∈MSUBn(dt) πG(dt′)

sem =⊔

dt′∈ASUBn(dt) πD(dt′)

where πG(dt′) denotes the derivation subtree ob-tained from dt′ by removing its decoration, andπD(dt′) ∈ nD denotes the decoration of the top-level production of dt′. This definition of abstrac-tion for decorated trees means that syntactic cov-erage for n is derived by taking the fundamen-tal approximation coverage of all maximum sub-trees rooted by n. To consider other than max-imum subtrees rooted by n would be in conflictwith the desired treatment of recursion. By con-trast, semantic coverage is derived from all sub-trees rooted by n because we want to observe thedecoration of all nodes with nonterminal n andtheir successor nodes.

There is a fundamental problem with two-dimensional coverage. In many cases, full cov-erage according to the above definition is not fea-sible. In the syntactic dimension, coverage hassometimes to be relaxed due to semantic con-straints. Dually, through syntactic dependencies,full attribute coverage is sometimes not feasible,i.e., in all derivation trees some attributes alwaystake values of a special form. Opportunities torelax the coverage notion are discussed later.

4.3 A test set sample

In Figure 5, a represantative test set for the ex-ample AG in Figure 3 is shown. The test setachieves greatest possible coverage according tothe following criteria. In the structural dimen-sion, η0(n) = 1 is assumed for nonterminals n. Inthe semantic dimension, η0(τ) = 2 is assumed forattribute types τ . Thereby, all attributes of typeID LIST are enforced to appear in derivationtrees in which they get the empty list, a single-ton list, and a list with at least two elements asvalues, if possible. The programs were actuallygenerated by the algorithm described in the nextsection.

begin skip end.begin if true then skip end.begin skip; skip end.begin begin skip end end.begin if true then skip; skip end.label a; begin a : skip end.begin begin skip end; skip end.label a; begin a : goto a end.begin begin skip; skip end end.label a; begin a : skip; skip end.label a; begin skip; a : skip end.label a; begin if true then a : skip end.label a; begin a : if true then skip end.label a; begin goto a; a : skip end.label a; begin a : skip; goto a end.label a; begin a : begin skip end end.begin label a; begin a : skip end end.label b; label a; begin b : a : skip end.label a; begin a : begin goto a end end.label b; label a; begin a : b : skip end.label b; label a; begin a : b : goto a end.begin label a; begin a : skip end; skip end.label b; label a; begin b : a : skip; skip end.label b; label a; begin skip; b : a : skip end.label b; label a; begin if true then b : a : skip end.label b; label a; begin a : b : if true then skip end.label b; label a; begin a : b : begin skip end end.

Figure 5: A test set for the AG from Section 2

For example, test programs are generatedwhere non-local labels are reachable from a block,i.e., the inherited attribute TL of the nonterminalBlock has to be nonempty. Note that full cover-age is not feasible because there are attributes oftype ID LIST which cannot be associated withthe empty list, e.g., the attribute TL of the left-hand side of rule [ldef].

Syntactically unfold onceSemantically unfold twice

Page 34: Grammar-based Testing Revisited

Open challenges✦ Static semantics

✦ Negative test cases

✦ Technology independence

✦ Graphs (rather than trees)

✦ Multiple backends

✦ Best practices

✦ A major tool

Best one out there? Probably Peter M.

Maurer’s DGL

Scalability

Page 35: Grammar-based Testing Revisited

Thanks!Questions or comments?