rubric: a flexible tool for automated checking of ...people.svv.lu/sabetzadeh/pub/esec_fse13.pdf ·...

RUBRIC: A Flexible Tool for Automated Checking ofConformance to Requirement Boilerplates

Chetan Arora, Mehrdad Sabetzadeh,Lionel Briand

SnT Centre for Security, Reliability and TrustUniversity of Luxembourg, Luxembourg

{firstname.lastname}@uni.lu

Frank Zimmer, Raul GnagaSES TechCom

9 rue Pierre WernerBetzdorf, Luxembourg

{firstname.lastname}@ses.com

ABSTRACTUsing requirement boilerplates is an effective way to mit-igate many types of ambiguity in Natural Language (NL)requirements and to enable more automated transformationand analysis of these requirements. When requirements areexpressed using boilerplates, one must check, as a first qual-ity assurance measure, whether the requirements actuallyconform to the boilerplates. If done manually, boilerplateconformance checking can be laborious, particularly whenrequirements change frequently. We present RUBRIC (Re-qUirements BoileRplate sanIty Checker), a flexible tool forautomatically checking NL requirements against boilerplatesfor conformance. RUBRIC further provides a range of di-agnostics to highlight potentially problematic syntactic con-structs in NL requirement statements. RUBRIC is based ona Natural Language Processing (NLP) technique, known astext chunking. A key advantage of RUBRIC is that it yieldshighly accurate results even in early stages of requirementswriting, where a requirements glossary may be unavailableor only partially specified. RUBRIC is scalable and canbe applied repeatedly to large sets of requirements as theyevolve. The tool has been validated through an industrialcase study which we outline briefly in the paper.

Categories and Subject DescriptorsD.2.1 [Software Engineering]: Requirements/Specifications

General TermsVerification

KeywordsRequirement Boilerplates; Natural Language Processing (NLP);Text Chunking.

1. INTRODUCTIONNatural Language (NL) is the most frequently used means

for requirements specification. When compared to formally-specified requirements, NL requirements tend to be easierto understand and manipulate since no special training is

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ESEC/FSE ’13, August 18-26, 2013, Saint Petersburg, RussiaCopyright 2013 ACM 978-1-4503-2237-9/13/08 ...$15.00.

[<When?><under what

conditions?>]

THE SYSTEM<system name>

SHALL

SHOULD

WILL

PROVIDE <whom?> WITH THE ABILITY TO

<process>

<process>

BE ABLE TO<process>

<object> [<additional details about the object>]

Figure 1: Rupp’s requirement boilerplate [6]

required. Despite their advantages, NL requirements areprone to ambiguity, meaning that even with careful specifi-cation, they may be understood in more than one way bydifferent stakeholders [6].

Requirement boilerplates, also known as templates, moldsor patterns [2, 6], provide an effective way to bring disciplineto NL requirements and to mitigate many types of ambigu-ity in natural language statements. In addition, boilerplatesmake the requirements more amenable to automated analy-sis, e.g., semantic consistency checking and transformationto models [6].

Figure 1 shows a well-known requirement boilerplate dueto Rupp [6]. The boilerplate envisages six syntactic slots:(1) an optional condition at the beginning; (2) the systemname; (3) a modal (shall/should/will) specifying how impor-tant the requirement is; (4) the required processing function-ality; this slot can admit three different forms based on themanner in which the functionality is to be rendered; (5) theobject for which the functionality is needed; and (6) optionaladditional details about the object. As an example, we showin Figure 2 a requirement statement R1, which conforms toRupp’s boilerplate. The segments corresponding to the dif-ferent slots in the boilerplate have been marked, with thefixed elements of the boilerplate written in capital letters.

The S&T module SHALL PROVIDE the system administrator WITH THE ABILITY TO monitor system configuration changes posted to the database.

R1:<system name> <whom?>

<process> <object> <additional details>

Figure 2: An example requirement that conforms toRupp’s boilerplate

To gain benefits from using boilerplates, it is paramountthat the analysts first ensure that the boilerplates have beencorrectly applied. If done manually, checking boilerplateconformance for large collections of requirements is a labori-ous task [6], particularly when the task has to be repeated asthe requirements evolve and many stakeholders are involved.

In this paper, we present RUBRIC (ReqUirements Boil-eRplate sanIty Checker). The tool’s main function is toautomatically verify correct use of boilerplates in NL re-quirements. RUBRIC does not require a complete require-ments glossary to be constructed first and provides highly

accurate results even without a glossary. RUBRIC in addi-tion provides a range of diagnostics highlighting potentiallyproblematic syntactic constructs in requirements, e.g., use ofpassive voice, conjunctions, pronouns and vague terms [2].In the remainder of this paper, we describe the motivationbehind RUBRIC, along with its main components and usagescenarios. We further outline key points from the evaluationwe have performed of the tool in an industrial case study.

2. MOTIVATION AND RELATED WORKThe motivation for RUBRIC comes from an observed need

when attempting to introduce boilerplates for documentingthe requirements of a large satellite system. A thorough in-spection of the requirements of the system to ascertain boil-erplate conformance would take several hours. This levelof effort combined with the frequent changes made by thenumerous stakeholders involved in the project posed a chal-lenge for maintaining boilerplate conformance.

Prior to developing RUBRIC, we conducted a review ofexisting requirements engineering tools, guided by a recentand comprehensive tools survey [4]. The goal of our re-view was to determine whether we could use already-existingsolutions for boilerplate conformance checking. We foundtwo tools that directly addressed this task: DODT [5] andRQA [7]. DODT provides means for bringing requirementsinto conformance with a boilerplate via requirements trans-formation; RQA provides features for defining boilerplatesand automatically checking conformance to them.

Neither tool was an ideal match for our needs: DODT re-quires a high-quality domain ontology for the system to bedeveloped first. In our project and arguably in most otherindustrial projects, developing such an ontology is too costlyto be practical. As for RQA, while an investigation of theunderlying technologies was not possible due to the tool’sproprietary nature, we observed that the tool’s ability tocorrectly identify different segments of requirements state-ments was adversely impacted when the glossary terms wereleft unspecified. This constraint too posed a problem, be-cause we wanted to provide feedback to analysts about boil-erplate usage at early stages, when such feedback is mostuseful and easiest to apply. Subsequently, we could not as-sume the completeness of the glossary. Furthermore, the lit-erature suggests that incompleteness in glossaries is a perva-sive issue in many projects [8]. Therefore, a high level of re-liance on the glossary for analyzing boilerplate conformancecan have negative consequences not only in our motivatingproject but in other projects as well. RUBRIC in contrast,and as stated earlier, does not require a requirements on-tology or glossary to be built first and yet manages to yieldhighly accurate results.

3. TOOL OVERVIEWFigure 3 shows an overview of the process implemented

by RUBRIC. The process consists of three main steps: (1)Text Chunking, (2) Boilerplate Conformance Checking, and(3) Natural Language Best Practices Checking. In the firststep, requirement statements are segmented into constituentparts (noun phrases, verb phrases, etc.) and annotated withappropriate tags. The annotated requirements, along withthe boilerplates’ syntax grammar rules (e.g. Rupp’s Boil-erplate syntax in Figure 5) are used in the second step toproduce boilerplates conformance results. The third stepneeds the annotated requirements from the first step and

Natural LanguageRequirements Boilerplate Syntax

Rules

Annotated Requirements

Text Chunking

Boilerplate Checking

NL Requirements best practices

checking

Diagnostics

Glossary(if available)

RequirementsAnalyst

Tool EngineerBest Practices Rules

Figure 3: Tool Overview

the NL requirements’ best practices rules (e.g., see [2]) togenerate feedback for potentially problematic constructs inthe requirements. In the rest of this section, we elaboratethe three main steps shown in Figure 3.

3.1 Text Chunking ProcessRUBRIC has been developed as an application for the

GATE workbench (http://gate.ac.uk/). GATE is an open-source Natural Language Processing (NLP) framework. Build-ing on the NLP modules available in GATE, the Text Chunk-ing step in Figure 3 decompose a sentence into non-overlappingsegments. The main segments of interest are Noun Phrasesand Verb Phrases. A Noun Phrase (NP) is a segment thatcan be the subject or object for a verb. A Verb Phrase(VP), sometimes also called a verb group, is a segment thatcontains a verb with any associated modal, auxiliary, andmodifier (often an adverb).

Similar to most NLP applications, a text chunker is apipeline of NLP modules, run in a sequence over an inputdocument. The (abstract) pipeline for chunking is shown inFigure 4. This pipeline can be instantiated in many ways,as there are alternative implementations for each step in thepipeline. The first module, the Tokenizer, breaks up theinput into tokens. A token can be a word, a number ora symbol. The Sentence Splitter divides the text into sen-tences. The POS Tagger annotates each token with a part-of-speech tag. These tags include among others, adjective,adverb, nouns, verb. Next is the Name Entity Recognizer,where an attempt is made to identify named entities, e.g., or-ganizations and locations. In a requirements document, thenamed entities can further include domain keywords andcomponent names. The main and final step is the actualChunker. Typically (but not always), NP and VP chunkingare handled by separate modules, respectively tagging theNPs and VPs in the input. When a glossary of terms isavailable, one can instruct the NP Chunker to treat occur-rences of the glossary terms in the input as named entities,thus reducing the likelihood of mistakes by the NP Chunker.

The input requirements after going through text chunk-ing have annotations for tokens, sentences, parts-of-speech,named entities, NPs, and VPs. These annotations are usedfor checking conformance to boilerplates and generating feed-back about adherence to requirements writing best practices.

3.2 Boilerplate Conformance Checking (BCC)The BCC step takes as input an annotated requirements

document (produced by the Text Chunking step) as well asthe syntax rules for the boilerplates of interest. The mainpurpose of BCC is to tag each requirement statement as ei-ther conformant or non-conformant. BCC can further markoptional parts (e.g, conditions and additional details) that

Tokenizer

Sentence Splitter

POS Tagger

Named Entity Recognizer

Noun Phrase (NP) Chunker

Verb Phrase (VP) Chunker

!

"

#

$

% &Chunker

Natural LanguageRequirements

Glossary Terms

Annotated Requirements

Figure 4: NLP pipeline for text chunking

<Boilerplate-Conformant> ::= <opt-Condition> <NP> <VP-starting-with-modal> <NP> <opt-Details> | <opt-Condition> <NP> <modal> "PROVIDE" <NP> "WITH THE ABILITY" <infinitive-VP> <NP> <opt-Details> | <opt-Condition> <NP> <modal> "BE ABLE" <infinitive-VP> <NP> <opt-Details><opt-Condition> ::= "" | <conditional-keyword> <non-punctuation-Token>* ","<opt-Details> ::= "" | <Token-sequence-without-subordinate-conjunctions><modal> ::= "SHALL" | "SHOULD" | "WOULD"<conditional-keyword> ::= "IF" | "AFTER" | "AS SOON AS" | "AS LONG AS"

Figure 5: BNF grammar for Rupp’s boilerplate

may be allowed by the boilerplates. RUBRIC uses Rupp’sboilerplate (Figure 1) as a reference for BCC. The boiler-plate defines three requirement types [6]:

– Autonomous requirements capture functions that thesystem performs independently of interactions with users.

– User interaction requirements capture functions thatthe system provides to specific users.

– Interface requirements capture functions that the sys-tem performs to react to trigger events from other systems.

Figure 5 shows a BNF grammar that characterizes Rupp’srequirement types in terms of the annotations produced bythe Text Chunking step. This grammar is implemented us-ing GATE’s regular expression engine, called JAPE [3]. Toillustrate how the grammar rules are expressed in JAPE, weshow an example in Figure 6, where we annotate the con-formant fragment of autonomous requirements. The JAPEscript in Figure 6 corresponds to the following BNF rule inFigure 5: <opt-Condition> <NP> <VP-starting-with-modal><NP> <opt-Details>.

In Figure 6, the optional condition followed by an NP iscaptured on lines 3-4. This corresponds to <opt-Condition><NP> in Figure 5. <VP-starting-with-modal> is captured bylines 5-6, meaning that the current slot has to be occupied bya VP which starts with a valid modal verb (shall/should/will).The VP cannot contain provide, which is a reserved verbfor user interaction requirements. The <NP> that followscaptures the object slot in the boilerplate. The optional de-tails (<opt-Details> in Figure 5) are marked using a differentrule, which is not shown due to space constraints.

Rule: MatchAutonomousRequirementFragment ( (({Condition}{NP}) | ({NP})) ({VP, VP.startsWithValidMD == "true", !VP contains {Token.string == "provide"}}) ({NP}) ):label --> :label.Conformant_Fragment = {explanation = "Matched pattern: Autonomous"}

1.2.3.4.5.6.7.8.9.

Figure 6: JAPE script for autonomous requirements

3.3 Checking NL Best PracticesRUBRIC detects and warns about several potentially prob-

lematic constructs that are discouraged by requirements writ-ing best practices. Berry et al [2] give a detailed treatmentof these best practices. The detection and markup of theproblematic constructs is performed using JAPE in a simi-lar manner to the BCC rules described earlier in Section 3.2.Figure 7 shows and exemplifies some of the problematic con-structs that RUBRIC detects and generates warnings for.

Tag Potential ambiguities [2] Example Warn_AND The “and” conjunction can imply

several meanings, including temporal ordering of events, need for several conditions to be met, parallelism, etc.

The S&T module shall process the query data and send a confirmation to the database. A temporal order is implied.

Warn_OR The “or” conjunction can imply exclusive or, or inclusive or

The S&T module shall command the database to forward the configuration files or log the entries. Inclusive or exclusive?

Warn_Quantifier Terms used for quantification such as all, any, every can lead to ambiguity if not used properly

All lights in the room are connected to a switch. Is there a single switch or multiple switches?

Warn_Pronoun Pronouns can lead to referential ambiguity

The trucks shall treat the roads before they freeze. Does “they” refer to the trucks or the roads?

Warn_VagueTerms There are several vague terms that are commonly used in requirements documents [2]. Examples include user-friendly, support, acceptable, up to, periodically. These terms should be avoided in requirements.

The S&T module shall support up to five configurable status parameters. support: providing the operator with the ability to define? up to: up to and including, or up to and excluding?

Warn_PassiveVoice Passive voice blurs the actor of the requirement and must be avoided in requirements.

If the S&T module needs a local configuration file, it shall be created from the database system configuration data. Not clear if the actor is S&T module, database, or another agent.

Warn_Complex_Sentence Using multiple conjunctions in the same requirement sentence make the sentence hard to read and are likely to cause ambiguity.

The S&T module shall notify the administrator visually and audibly in case of alarms and events. Statement may be interpreted as visual notification only for alarms and audible notification only for events.

Figure 7: A (non-exhaustive) list of potentiallyproblematic constructs detected by RUBRIC

As an example, Figure 8 shows the JAPE script for tag-ging the use of pronouns. Pronouns can lead to referen-tial ambiguity if used improperly [2]. The rule in the fig-ure matches any Token in the text whose part of speech ispersonal pronoun (PRP), possessive pronoun (PRP$), wh-pronoun (WP), or possessive wh-pronoun (WP$). If matched,the pronoun will be labeled with Warn_Pronoun. At a laterstage, a requirements analyst can review these warnings todetermine which constructs are indeed ambiguous and takeappropriate remedial action.

Rule: MarkPronouns ( {Token.category == "PRP"} | {Token.category == "PRP$"} | {Token.category == "WP"} | {Token.category == "WP$"} ):label --> :label.Warn_Pronoun = {explanation = "Pronouns can cause ambiguity."}

Figure 8: JAPE script for detecting pronouns

4. DIAGNOSTICSRUBRIC leverages GATE’s visual markup environment

for providing diagnostics about boilerplate conformance andrequirements writing best practices. Figure 9 shows a snap-shot of RUBRIC. The left panel shows the various GATEresources (Applications, Language Resources, and Process-ing Resources). The centre panel displays the contents of theselected resource, here a requirements document. The rightpanel shows the annotations list for the current document.The highlighted sections in the document mark regions an-notated with selected tags from the right panel.

5. EVALUATIONWe have rigorously evaluated RUBRIC over an indus-

trial case study, where 380 requirements were specified usingRupp’s boilerplate by industry experts [1]. The case studywas conducted in collaboration with our industrial partner

Figure 9: Snapshot of RUBRIC

SES TechCom - a leading provider of satellite communica-tion services worldwide.

The requirements were first manually inspected to deter-mine which ones conformed to Rupp’s boilerplate and whichones did not. Of the 380 requirements, 243 (64%) were con-formant and the remaining 137 (36%) were non-conformant.In the next phase, the requirements were analyzed using al-ternative instantiations of the pipeline shown in Figure 4and explained in Section 3.1. There were 72 instantiationsof the pipeline induced by the different implementations ofthe NLP modules in Figure 4.

The accuracy of RUBRIC in terms of BCC was evalu-ated using precision and recall metrics. Precision and recallaveraged at 88.6% and 93.4% respectively across all 72 in-stantiations. The best pipeline instantiation (in terms ofcombined precision and recall) yielded a precision of 91%and a recall of 99%. We note that recall is the primary fac-tor in our context since it is easier for analysts to rule out asmall number of false positives, rather than search througha large document for finding the false negatives1.

We have further evaluated the scalability of RUBRIC bymeasuring its execution time over requirement sets of differ-ent sizes. We observed linear increases in execution time asthe number of requirements grew. The average time per re-quirement statement across all the examined pipeline instan-tiations was 25 milliseconds. This suggests that RUBRICcan be applied to much larger sets of requirements than con-sidered in our case study.

Another important result from our evaluation was thatthe presence of a glossary does not provides major gains interms of precision and recall. This is noteworthy becausethe construction of a glossary may succeed or be done intandem with (rather than precede) preliminary quality as-surance activities such as BCC.

Apart from the case study, we have received positive feed-back from the experts using the tool. Although, we have notyet systematically evaluated the experts’ perception of thetool, the qualitative feedback we have received is promising.

In summary, our evaluation suggests that (1) RUBRICis effective for BCC even when no glossary is available; (2)RUBRIC scales well and is expected to be applicable to verylarge sets of requirements.

1A false positive is a requirement that is boilerplate con-formant but is found to be non-conformant by the tool. Afalse negative is a requirements that is non-conformant butis found to be conformant by the tool.

6. IMPLEMENTATION AND AVAILABILITYThe core of RUBRIC consists of 28 JAPE scripts with

approximately 500 lines of script code. RUBRIC furtherincludes 9 customizable lists containing keywords used bythe scripts, e.g., boilerplate fixed terms, vague terms, con-junctions, acceptable modals. To facilitate using RUBRICin a production environment, we have implemented a plu-gin to export requirements from the Enterprise Architecttool (http://www.sparxsystems.com.au). This plugin isapproximately 1000 lines of C# code. Similar plugins canbe developed for other requirements management tools.

RUBRIC was publicly released in June 2013. The tool(along with a screencast illustrating its use) is available at:

sites.google.com/site/rubricnlp/

7. CONCLUSIONWe presented RUBRIC – a natural language processing

tool for automatic checking of conformance to requirementboilerplates and detecting problematic constructs in naturallanguage requirements. Our current implementation sup-ports only Rupp’s boilerplate [6] but can be generalized toother boilerplates as well. Our evaluation indicates thatRUBRIC is scalable and highly accurate in distinguishingrequirements that conform to Rupp’s boilerplate from thosethat do not. The accuracy in general shows little sensitivityto the presence or absence of a requirement glossary.

In the future, we plan to conduct more studies to quan-tify the gains from using RUBRIC in industrial settings interms of time, effort, and quality resulting from the use ofboilerplate conformant requirements. We further need toexamine the effectiveness of our approach for the case wherethe requirements are not consciously written to conform toa given boilerplate. Another topic for investigation is pro-viding instant guidance to analysts to help them conform toboilerplates as they are writing the requirement statements.

8. ACKNOWLEDGEMENTWe gratefully acknowledge funding from the National Re-

search Fund - Luxembourg (FNR/P10/03 - Verification andValidation Laboratory).

9. REFERENCES[1] C. Arora, M. Sabetzadeh, L. Briand, F. Zimmer, and

R. Gnaga. Automatic checking of conformance torequirement boilerplates via text chunking: An industrialcase study. In ESEM’13, 2013.

[2] D. Berry, E. Kamsties, and M. Krieger. From ContractDrafting to Software Specification: Linguistic Sources ofAmbiguity, A Handbook. 2003. U. Waterloo.

[3] Cunningham et al. Developing Language ProcessingComponents with GATE Version 7 (a User Guide).

[4] J. C. de Gea, J. Nicolas, J. F. Aleman, A. Toval, C. Ebert,and A. Vizcaıno. Requirements engineering tools:Capabilities, survey and assessment. Information &Software Technology, 54(10), 2012.

[5] S. Farfeleder, T. Moser, A. Krall, T. Stalhane, H. Zojer, andC. Panis. DODT: Increasing requirements formalism usingdomain ontologies for improved embedded systemsdevelopment. In DDECS’11, 2011.

[6] K. Pohl and C. Rupp. Requirements EngineeringFundamentals. Rocky Nook, 1st edition, 2011.

[7] RQA. http://www.reusecompany.com/rqa.[8] X. Zou, R. Settimi, and J. Cleland-Huang. Improving

automated requirements trace retrieval: a study ofterm-based enhancement methods. Empirical Softw. Eng.,15(2), 2010.

rubric: a flexible tool for automated checking of ...people.svv.lu/sabetzadeh/pub/esec_fse13.pdf ·...

Documents