documentma

Presupposition Analysis in Requirements

Lin Ma

[email protected] Supervisors Prof. Bashar Nuseibeh

Prof. Anne De Roeck Dr. Paul Piwek Dr. Alistair Willis

Department/Institute Department of Computing Status Fulltime Probation viva After Starting date 1-Feb-2009 Motivation Natural language is the most commonly used representation language in requirements engineering [1]. However, compared with formal logics, natural language is inherently ambiguous and lacks a formal semantics [2]. Communicating requirements perfectly through natural language is thus not easy. Examining the linguistic phenomena in natural language requirements can help with decoding what a person means in communication. This method was originally used in psychotherapy and then adopted in requirements engineering [3]. Presupposition is one of these linguistic phenomena. It simplifies communication by pointing to references to bits of knowledge that are taken for granted by the document writer. In requirements engineering, however, we must know exactly what information we’ve lost by simplification, or we run the risk of a misunderstanding. For instance, the requirement (1) Accessibility in the experimental hall is required for changing the piggy board

where the device will be mounted. commits the reader to the presuppositions that there is an experimental hall, there is a piggy board and there is a device. These types of implicit commitments might be misinterpreted or overlooked due to different background knowledge in the other stakeholder’s domain. More precisely, for instance, concerning the presupposition that there is a piggy board in example (1), the reader of this requirement may know a piggy board A and choose to believe A is the thing that the document writer is writing about. However, the document writer may mean piggy board B or just any new piggy board. In this research, we propose to use natural language processing techniques for automatically detecting such implicit commitments in requirements documents, and identifying which of those are not made explicit. Background Presuppositions are triggered by certain types of syntactic structures – presupposition triggers [4]. Therefore, presuppositions can be found by identifying the triggers in the

2010 CRC PhD Student Conference

Page 51 of 125

text. The presupposition trigger types can be divided into two general classes – definite descriptions (noun phrases starting with determiners such as the piggy board in example (1)) and other trigger types (for example, cleft - It + be + noun + subordinate clause, stressed constituents - words in italic in texts). Definite descriptions differ from other trigger types because they occur very frequently in all styles of natural language [5], are easy to retrieve (because of their distinct structure with the determiner the) and they often have possible referential relations with earlier text [6]. We hence focus on presuppositions triggered by definite descriptions in this research. One major problem in the study of presupposition is presupposition projection. An elementary presupposition is a presupposition of part of an utterance. Presupposition projection, as the name suggests, is the study of whether an elementary presupposition is a presupposition of the whole utterance (termed as actual presupposition). Here two examples are given for distinct scenarios in requirements, one where an elementary presupposition projects out and one where it does not: (2) a. If funds are inadequate, the system will notify…. b. If there is a system, the system will notify… Intuitively, when a reader accepts utterance (2b), he/she does not take the presupposition that there is a system for granted. The elementary presupposition that there is a system in the consequent of the conditional somehow does not project. The same elementary presupposition that there is a system nevertheless projects out in example (2a), which signals to the reader that the document writer takes for granted that there is a system. Methodology The Binding Theory [7] of presupposition is a widely accepted formal framework for modelling presupposition, in which presupposition is viewed as anaphora (anaphora are expressions, such as a pronoun, which depends for its interpretation on a preceding expression, i.e., an antecedent). Presupposition projection is treated as looking for a path to an earlier part of the discourse which hosts an antecedent that can bind the presupposition. Whenever an antecedent is found in the discourse, the presupposition is bound, and thus does not project out. Therefore, according to the Binding Theory, the actual presuppositions in a discourse are those which do not have any antecedent existing earlier in the discourse. We adopt this view as the theoretical ground. [8] presents an automated approach for classifying definite descriptions. This approach is compatible with the Binding Theory. It classifies definite descriptions as:

Discourse new: those that are independent from previous discourse elements for the description interpretation (according to the Binding Theory, discourse new definite descriptions introduce actual presuppositions with respect to a discourse, because they do not have any antecedent);


Page 52 of 125

Anaphoric: those that have co-referential1 (co-reference is defined as multiple expressions in a sentence or document have the same referent) antecedents in the previous discourse;

Bridging [9]: those that either (i) have an antecedent denoting the same discourse

entity, but using a different head noun (e.g. a house . . . the building), or (ii) are related by a relation other than identity to an entity already introduced in the discourse (e.g. the partial relation between memory…the buffer).

Given example (3), “the experimental hall” has an antecedent in the previous sentence – “an experiment hall”, so it will be classified as anaphoric. If we somehow have the knowledge that a piggy board is a small circuit board mounted on a larger board, “the piggy board” is a bridging definite description referring to part of “PAD boards”. Finally, “the device” is a discourse new definite description which triggers the actual presupposition that there is a device with respect to the discourse. (3) An experimental hall shall be built….

PAD boards shall be used…. Accessibility in the experimental hall is required for changing the piggy board where the device will be mounted.

In [8], the authors used a set of heuristics based on an empirical study of definite descriptions [6] for performing the classification task. The heuristics include, for example:

For discourse new definite descriptions: one of the heuristics is to examine a list of special predicates (e.g. fact). If the head noun of the definite description appears in the list, it is classified as discourse new.

For anaphoric definite descriptions: matching the head noun and modifiers with earlier noun phrases. If there is a matching, it is classified as anaphoric. For example, An experimental hall…the experimental hall.

For bridging: one of the heuristics is to use WordNet [10] for identifying relations between head nouns with earlier noun phrases. If there is a relation, such as a part-of relation, it is classified as bridging. For example, PAD boards…the piggy board.

However, as stated by the authors of [8], this approach is insufficient to deal with complex definite descriptions with modifiers and lacks a good knowledge base to resolve the bridging definite descriptions (WordNet performed really poor in this case). In my research, we will further develop this approach and implement a software system that is able to analyze the projection behavior of presuppositions triggered by definite descriptions in requirements documents. The development focus is on analyzing modifiers of definite descriptions and making use of external knowledge sources (such as ontologies built upon Wikipedia [11]) for resolving bridging definite descriptions. Especially for bridging definite descriptions, if the relation can be 1 In a strict sense, the concept of anaphora is different from co-reference because the former requires the meaning of its antecedents to interpret, but the latter do not. Here they are used as synonymies as multiple expressions in a sentence or document have the same referent.


Page 53 of 125

identified in the knowledge base, it will help with making a choice between creating a new discourse entity or picking up an existing antecedent. As a result, the actual presuppositions (the discourse new definite descriptions) can be identified. The system will be evaluated through existing corpora with annotated noun phrases, such as the GNOME corpus [12]. We will also manually annotate several requirements documents and perform the evaluation on the annotation results. References [1] L. Mich and R. Garigliano, “NL-OOPS: A requirements analysis tool based on

natural language processing,” Proceedings of the 3rd International Conference on Data Mining Methods and Databases for Engineering,, Bologna, Italy: 2002.

[2] V. Gervasi and D. Zowghi, “Reasoning about inconsistencies in natural language requirements,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 14, 2005, pp. 277–330.

[3] R. Goetz and C. Rupp, “Psychotherapy for system requirements,” Cognitive Informatics, 2003. Proceedings. The Second IEEE International Conference on, 2003, pp. 75–80.

[4] S.C. Levinson, Pragmatics, Cambridge, UK: Cambridge University Press, 2000. [5] J. Spenader, “Presuppositions in Spoken Discourse,” Phd. Thesis, Department of

Linguistics Stockholm University, 2002. [6] M. Poesio and R. Vieira, “A corpus-based investigation of definite description

use,” Computational Linguistics, vol. 24, 1998, pp. 183–216. [7] R.A. Van der Sandt and B. Geurts, “Presupposition, anaphora, and lexical

content,” Text Understanding in LILOG, O. Herzog and C. Rollinger, Eds., Springer, 1991, pp. 259-296.

[8] R. Vieira and M. Poesio, “An empirically based system for processing definite descriptions,” Computational Linguistics, vol. 26, 2000, pp. 539–593.

[9] H.H. Clark, “Bridging,” Thinking, 1977, pp. 411–420. [10] C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT

press, 1998. [11] M.C. Müller, M. Mieskes, and M. Strube, “Knowledge Sources for Bridging

Resolution in Multi-Party Dialog,” Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco: 2008.

[12] M. Poesio, “Annotating a corpus to develop and evaluate discourse entity realization algorithms: issues and preliminary results,” Proc. of the 2nd LREC, 2000, pp. 211–218.


Page 54 of 125

documentma

Technology