models on the resulting (application) software systems was ... · 208 software quality management...
TRANSCRIPT
Quality of data models
R. Maier
Institute of Business Informatics, The Koblenz School of
Corporate Management, Burgplatz 2, D-56179, Vallendar,
Germany
Abstract
After a short overview of the definitions of the terms data model and qualitythree different approaches to quality of data models are described. Theapproaches have in common that neither application fields nor the users of datamodels are taken into account. The term quality is interpreted in a production-oriented way. The results of an empirical analysis conducted by the author arepresented. They show that application fields and users differ widely in thevarious data models in different enterprises. The analysis concentrates on themissing aspects of the quality concepts in the literature: application fields, users,organizational context. A quality concept for data models which was developedby the author and is based on the results of the empirical analysis is outlined.The quality concept can be divided into two parts: a set of quality metrics and thedescription of a review process. Both parts are needed to evaluate data modelseffectively.
1 Introduction
In the last 25 years a considerable number of modelling methodologies has beendeveloped, such as data, function, data flow, process modelling methodologies.These methodologies were supposed to support, for example the process ofsoftware construction (software engineering), to master complexity, to supportcommunication between developers and users and to improve the quality of theresulting software. Additionally, a lot of research work has been published thatdeals with the measurement and the improvement of software quality. However,there has been little interest in investigating the quality of the models that formthe basis of the software and the data structures. The impact of "good" or "bad"
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
208 Software Quality Management
models on the resulting (application) software systems was underrated a longtime.
Today the later phases of the software life cycle (e.g. programming, testing) aresupported by numerous quality assurance activities and metrics as opposed tothe earlier phases (e.g. requirements engineering, design) for which only verycrude quality concepts exist. And what is more, the design costs form a majorand increasing part of the total system development costs. The effects of designinconveniences on the total costs are considerable.
Therefore in this paper quality concepts are applied to the earlier phases of thesoftware life cycle. Today, although data modelling methodologies are widelyaccepted in practice, there is little empirical research work published on thequality of data models. The author conducted an empirical study on this topic togain deeper insights into the problems, benefits and pitfalls of modelling incompanies. Based on the results of this study a quality concept for data modelsin varying application areas is developed.
The aims of the paper are:• to give an overview of the approaches to quality of data models in the
literature;• to present the results of an empirical study about possible applications,
benefits and approaches to quality of data models in practice;• to outline a concept concerning the quality assurance of data models.
2 Quality Concepts in the Literature
There is a lot of confusion about the terms "model", "modelling", "data" and"information" in the literature (see [9], [12]). The term "data model" applies totwo different phenomena. On the one hand "data model" refers to formalismsthat support the process of modelling data: a data model provides a formal framefor the descriptions of the real world in a form that can be understood bycomputers (see [15]). Examples for this definition of data model are thehierarchical, network, relational or semantic data models (e.g. ERM). On theother hand the term "data model" denotes the results of the modelling process: adata model is an exact image of the data needed by a task, which is free ofcontradiction and redundance, including the relations between these data (see[7]). In this paper the term "data model" is used in this latter sense.
Definitions of the term "quality" can be found in abundance, for example:"quality consists of those product features which meet the needs of customersand thereby provide product satisfaction" ([8]). As far as data modelling isconcerned, it is not obvious who the "customers" of data models are and it iseven more uncertain what their "needs" are. The empirical analysis conducted by
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
Software Quality Management 209
the author therefore was designed to find some answers to these questions.
In the following three approaches to quality of data models found in the literatureare presented. For a detailed investigation of the concepts outlined here see 110].
2.1 Data Model CharacteristicsAccording to Batini et al. [1] the (main) aim of the conceptual design is thedevelopment of a high-level conceptual scheme which is independent of the database management system used. The starting point for the conceptual design isthe requirements definition. In that case supporting the determination ofrequirements is not the focus of conceptual modelling. Data schemes are pro-ducts of a design process which has to be validated. The validation processconsists of checking the following characteristics of data models (see [1]):completeness, correctness, minimality, expressiveness, readability, self-expla-nation, extensibility, normality.
Batini et al. recommend checking data models periodically during the designprocess and once the design is finished. For a detailed investigation of thecharacteristics proposed see (10].
Assessment: Batini et al. give a list of eight characteristics for data models.The list can be used as a basis for the development of a quality concept for datamodels. The authors highlight some problems of the quality control of datamodels, e.g. conflicting characteristics, the "optimal" quality level, but they donot define any priorities for conflicting characteristics. Moreover, they do notconsider different organizational contexts and application fields. Therefore, thequality concept is "production-oriented" as opposed to the "customer-oriented"definition of the term "quality" (see above). Batini et al. restrict their approach tothe "definition" of characteristics. They do not operationalize them, so datamodels cannot be measured or judged objectively, but (very) subjectively. Thereare no statements about the process of the judgement, so the judging person hasto rely on the examples the authors give and on personal experience. The"directions" for the suggested transformations of data models which shouldimprove the quality of the models are heuristic.
At least Batini et al. give some examples of what a "good" data model shouldlook like and what has to be done to improve data models. It is a pity that theauthors do not consider different application fields of data models. Thus some ofthe conflicts between different characteristics could be solved.
2.2 Quality Metrics and Aggregation ProcedureHeilandt/Kruck [6] present an algorithmic procedure for the evaluation andaggregation of (extended) entity-relationship-models. Data modelling in thesense of Heilandt/Kruck means enterprise-wide data modelling. Data models
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
210 Software Quality Management
hence are used to control the development of application systems in the companyas a whole. The defined characteristics are much more operationalized comparedwith the approach of Batini et al.
The authors use their approach to cope with the problem of the high complexityof enterprise-wide data models. The quality metrics and the approach to theaggregation of data models should help to master complexity. Heilandt/Kruckpoint out that there are different "customer" groups of data models, especially ofenterprise-wide data models: users who are interested in more global aspects ofthe model (e.g. top-management, DP-management) versus users who areinterested in certain areas of the model for which they are responsible (e.g.functional managers, DP coordinators).
Both user groups are often discouraged by the size and complexity of enterprise-wide data models (see (6|). The models cannot be used easily to clarify globaldependencies and connections between data objects because of the lack ofclearness. With the procedure of Heilandt/Kruck the missing clearness should bere-established. They define the following quality metrics for the evaluation ofER-models: quality of the arrangement, complexity, specialization.
Assessment: In the beginning Heilandt/Kruck describe some of the problemareas of data models and different user groups. However, they make no attemptto include these aspects into their approach. It is not clear what higher or lowervalues of the quality metrics (e.g. quality of arrangement) mean to these problemareas or different user groups. For instance it is unclear, whether a lowcomplexity value indicates a better readability of the model or not.
The authors give neither empirical results, nor their hypothesis about theconnections between their quality metrics and the usability of data models. Apartfrom the separation of user groups the authors do not consider differentapplication fields of data models. Nevertheless, Heilandt/Kruck operationalizethe characteristics complexity, arrangement and specialization. Some valuablesuggestions can be gained from their work.
2.3 3-Stages-Concept for CorrectnessZamperoni et al. [17] propose a more pragmatic approach to check thecorrectness of data base specifications: the validation. Data models represent theresults of the data base design process, which has to be validated to form a basisfor the implementation of data base systems. As opposed to Batini et al. thevalidation process refers exclusively to correctness. The authors distinguishthree levels of correctness of data models (see [17]): syntactic correctness,consistency, semantic correctness. Syntactic correctness is the lowest, semanticcorrectness is the highest level of quality. With the differentiation of the threelevels the tool-based evaluation of data models should be extended to captureparts of the semantic correctness: consistency.
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
Software Quality Management 211
The distinction between semantic and syntactic correctness is made by manyauthors. The semantic correctness is usually defined with respect to (part of) the"reality" which has to be modeled. Thus, the evaluation of semantic correctnesscannot be automatised. The correctness level consistency and the automaticconsistency check form a first step to a structuring of the evaluation of semanticcorrectness.
Assessment: Zamperoni et al. concentrate on the development of qualitycriteria for correctness. They make no attempt to integrate their criteria into acomprehensive quality concept for data models. Moreover, the authors do notspecify for which application fields data models can be evaluated with theirmethod. Data models which are used for the design of data bases are usuallyspecified exactly, whereas enterprise-wide data models are more general and notso exact. It is not clear in which cases the evaluation is worth the effort.Additionally one can criticize that different users have different perceptions ofthe "reality", so it is not possible to simply compare interpretations of the datamodel with those of the "real world".
However, the approach can be seen as a first step to the ope rationalization ofcorrectness. It can be used to detect some design errors (e.g. prohibited cycles).The approach is based on the concept of existential dependencies which isinvestigated in detail by Sinz [16J who has proposed an extended ER-approachto guarantee consistency by restricting the design possibilities.
3 Results of the Empirical Analysis
The empirical analysis was conducted from September to December 1994. Thesample consists of the following target groups:• German companies with more than 5000 employees (EM);• German software and system houses with more than 50 EM;• German technical (> 100 EM) and business (>20 EM) consultant companies
which deal with DP themes.
The questionnaire comprised 28 questions and covered the following topics:• Organizational context of data modelling: Structure, organizational aids, "data
politics", responsibilities, personnel;• Application fields (described in detail below);• Methods and tools used: Data modelling methods and tools, reasons for the
application, relation to business process and object-oriented modelling, datadictionaries/repositories;
• Characteristics of data models: components used, enterprise-wide datamodels, figures (e.g. number of entity types, attributes), reference models,cost/benefit analysis, judgement or measurement of the quality of datamodels.
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
212 Software Quality Management
In the following some preliminary results of the analysis are shown. A moredetailed description of the results can be found in [11].
In the literature there are hardly any studies reported which deal with applicationfields of data models. Some authors, such as Scheer [14] and Fischer [5],hypothesize about the reasons for a data oriented software development processor about the benefits of enterprise-wide data models. However, there is nodetailed empirical analysis about the (real) benefits of data modelling in practice.
To learn more about this topic the author has conducted an empirical studywhich represents the first step in the process of the development of qualitycriteria for data models. The author has developed a catalogue of benefits whichis based on• the above mentioned lists (see e.g. [14]),• a more general empirical study about data modelling in practice (see [13]),• the intentions and aims which led to the development of new data modelling
methodologies (see e.g. [2], [3], 116]) or proposals of new designparameters for methodologies or tools (see e.g. [4], [5]),
• the results of structured interviews with experienced data modellingpractitioners conducted by the author.
In the following the catalogue of benefits is listed. In the questionnaire thebenefits had to be scaled with regard to their importance (5=very important, 1 =not important) in two ways:1. The (hypothetic) importance of the benefit for the person answering
(PERS).2. The estimated "real" benefit for the organization the person answering works
for (ORG).
Benefit PERS(Means)
ORG(Means)
DM as the basis for the development of application systems
Reduction of the time required for the development (moreLoC/unit of time through clear instructions)
Reduction of the frequency of new designing caused bydata structures
Increased correspondence between the application systemand the requirements
Improved documentation of the application system
Creating a basis for further developments
Improved communication between the applicationfunctions
3,52
3,96
3,96
3,96
3,48
3,22
2,44
2,96
3,00
2,74
2,41
2,37
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
Software Quality Management 213
Automatic generation of parts of the (application) system
Aids for checking the feasibility of project proposals
2,
2,DM as the basis for the design of the data base or thedata organization, respectively
Command of the functional complexity of the data system
Command of the DP-technical complexity of the datasystem (e.g. heterogeneous system environments)
Increased stability of the design of the data base
Increased correspondence between the design of the database and the requirements
Improved clearness of the design of the data base
Reduction of the time required for the development of database definitions
Increased reusability of data structures
4,
3,
4,
4,
3,
3,
3,DM as the basis for the use of standard software
Basis for the selection of standard software
Basis for the configuration of the selected standardsoftware
Basis for the adaption or extension of the standardsoftware (Customizing)
Basis for the integration of standard software into theexisting application systems environment
3,
2,
3,
3,
DM as the basis for the integration in the DP area
(Organization-wide) integration of (operative) applicationsystems
(Organization-wide) integration of various designs of databases (e.g. avoiding undesired redundancies)
Integration of (operative) application systems withmanagement information systems (planning and decisionsupport systems)
Integration with application systems of business partners(e.g. customers, suppliers)
3,
3,
3,
3,
DM as the basis for management information systems
Creating a basis for analyses in MIS
Standardized aggregation of data from various businessareas
3,
3,
DM as an organization instrument in the DP area
Instrument for project planning (e.g. classification,priori tization, planning of order)
2,
63
52
I
1physica
07
59
30
37
89
04
81
3
2
2
3
2
2
2
22
81
67
67
1
1
1
2
37
74
85
30
2
2
2
2
(MIS)
96
93
2
2
59 1
,89
,851
,37
,52
,89
,19
,85
,48
,59
,70
,59
,81
,04
,07
,41
,22
,04
,41
,41
,37
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
214 Software Quality Management
Instrument for project controlling (e.g. survey of thecurrent projects, control of the project status)
Basis for the analysis of existing application systems
Basis for the training of new co-workers in the DP area
Instrument for the analysis and management of deviationsfrom (data) standards
Aid for the development of DP strategies
Support for the estimation of costs (e.g. "data points")
Improved project basis (e.g. requirements definition
2
3
2
2
2
2
3DM as an organization instrument outside the DP are;
Analysis instrument: Improved understanding of businessrelations or of the own "business", respectively
Basis for the training of new co-workers in the functionaldepartments
Increased transparency of the "production factor" data
Creating a stable, long-term basis for enterprise-modelling
3
3
3
3
DM as a means for improving communication
Within the project team that is responsible for applicationdevelopment
Between managers in the DP department and the functionaldepartments
Between members of the DP department and the functionaldepartments
With the top management
With externals dealing with DP (e.g. suppliers of standardsoftware)
With other organizations (e.g. customers, suppliers,associations)
3
3
3
3
3
3
DM as a means for improving motivation/acceptance
Improved motivation with regard to the cooperation offunctional experts and DP experts
Improved functional experts' acceptance of the process ofapplication development
Improved top management's acceptance of the process ofapplication development
3
2
2
,26
,33
,96
,93
,74
,70
,56
1
2
1
1
1
1
2i
,96
,19
,37
,96
2
1
I
2
,52
,19
,30
,19
,56
,22
2
2
2
1
2
2
,19
,78
,44
2
1
1
,52
,15
,74
,78
,59
,59
,04
,11
,48
,93
,33
,70
,04
,30
,93
,11
,07
,07
,96
,63
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
Software Quality Management 215
Improved DP experts' acceptance of the process ofapplication development
Improved acceptance of the (finished) application system
3
3
DM as an instrument for standardization
Standardized procedure of developing applications
Standardization of the organization-wide system ofterminology
Introduction of "data standards" for exchange of databetween companies
Introduction of "data standards" for electronic dataexchange between companies
Standardized user interfaces (e.g. identical data structureslead to identical design of screen dialogues)
3
3
4
3
3
DM as a means for the reduction of maintenance cost
Reduced maintenance costs for applications as a result of astable data basis
Quicker correction of errors as a result of quickerlocalization
Reduced training effort when changing responsibilitiesduring the life-cycles of the application systems
Supporting the after documentation of (data) systemsalready in use
Avoiding unintended side-effects in connection withsubsequent changes in the data system
Reduced testing costs (e.g. quicker provision ofmeaningful test cases)
Supporting the tasks of re-engineering (renovating oldapplication systems)
3
3
3
2
3
3
3
DM as the basis for the use of application systems indepartments
Optimizing the use of the available data bases by providingaccess/survey for wider user groups (e.g. users do theirreporting/analyzing themselves)
Optimizing the use of the available data bases bysupporting a precise understanding of the data structures(e.g. prevention of misinterpretation)
3
3
33
,33
2
2
,58
,92
,12
,85
,31
2
2
2
2
1
s
,63
,26
,41
,63
,85
,04
41
2
2
2
1
2
i
2,
fuctiona
26
59
2,
1,
,26
,44
,73
,58
,73
,42
,92
,59
,26
33
85
59
noz.̂.
00
I
00
89
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
216 Software Quality Management
Integrating the applications that were developed by thefunctional departments into the enterprise-wide data model
Increased flexibility of the information supply (e.g.through quicker changeability of reports/analyses)
2
2
,78
,85
1,48
1,78
4 Quality Concept for Data Models
Based on the results of the empirical analysis described in chapter three a qualityconcept for data models is developed. The following aspects have to be takeninto account:
Application fields, "customers" andthe organizational context of datamodelling influence the quality con-cept. Thus, it is not desirable to de-velop the "one and only" qualitymetric. This would mean a pro-duction-oriented point of view;The relations between applicationfields, characteristics and qualitymetrics have to be investigatedthoroughly;Quality metrics are used to classifydata models, to find problem areasetc. However, it is not recommen-
StructuredChecklistsfor Reviews
Quality-Metrics
for automChecks
Results of the Empirical Analysis "Benefitand Quality of Data Models in Practice"
ded to use quality metrics as the
Fig. 1: Basis and two main columns ofthe quality concept
only means to evaluate data models because the metrics cannot capture the"customers'" requirements of data models;
• Checklists and a review-process support detailed analysis of the data models;• Some proposals for a re-design of the process of data modelling are
generated.
The following list of areas for which the author develops quality metrics give animpression of what quality metrics for automated checks of data models couldlook like:Classification of data models
Size of data models (e.g. number of entity types)Development effort (e.g. number of person days)Ratios (e.g. number of attributes/entity type)
Design-metricsComplexityQuality of the arrangementSpecialization
Use/maintenanceUser groups, intensity of useStability
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
Software Quality Management 217
StructuringBusiness objects/core entitiesArea cohesion, area couplingGeneralization-/specialization-hierarchies"Hot Spots" (exceptional elements)
Data model and application developmentRelation to requirements definitionRelation to conceptual design (functional, data, process etc.)Relation to implementation
Enterprise-wide data model - project data modelsRatios"Balance" of the project data models
5 Conclusion
In the last 25 years data modelling has become a methodology widely acceptedboth, in science and practice. Nowadays an increasing number of organizationsuses data modelling. Time has come to think about the usefulness of datamodelling and about the circumstances under which data modelling is effectiveand efficient. In this paper application fields, users and the organizationalcontext were identified as the key parameters which influence the usefulness ofdata models. The quality concept outlined in chapter four relies on these aspects.
However, a lot of work remains to be done. The author is developing qualitycriteria and a proposal for a review process (consisting of checklists and ageneral structure of the process). These criteria and the review have to be testedin real (business) applications. These tests will take place during the year 1995.The author will then report about his findings.
6 Bibliography
[1J Batini, C, Ceri, S., Navathe, S. B.: Conceptual Database Design: AnEntity-Relationship Approach, Redwood City et al. 1992
12] Chen, P. P.: The Entity-Relationship Model - Toward a Unified View ofData, in: ACM Transactions on Database Systems, Vol. 1, No. 1, March1976, 9-36
|3] Codd, E.F.: A Relational Model of Data for Large Shared Data Banks, in:Communications of the ACM, Vol. 13, No. 6, 377-387, 1970
[4J Crockett, H. D., Guynes, J., Slinkman, C. W.: Framework forDevelopment of Conceptual Data Modelling Techniques, in: Informationand Software Technology, Vol. 33, No. 2, 134-142, 1991
15] Fischer, JL: Datenmanagement - Datenbanken und betriebliche Daten-model lie rung, Munich, Vienna 1992
[6] Heilandt, T., Kruck, P.: Ein algorithmisches Verfahren zur Bewertungund Verdichtung von Entity-Relationship-Modellen, in: Informatik-Forschung und Entwicklung, Vol. 8, 1993, 197-206
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
218 Software Quality Management
[71 Heinrich, L.J., Roithmayr F.: Wirtschaftsinformatik-Lexikon, 4th Ed.,Munich, Vienna 1992
[8] Juran, J. M., Gryna, F. M. (Ed.): Juran's Quality Control Handbook, 4thEd., New York et al. 1988
[9] Lehner, F.: Modelle und Modellierung in Angewandter Informatik undWirtschaftsinformatik, Research Paper No. 10, Insitute for BusinessInformatics, The Coblence School of Corporate Management, Vallendar1994
110] Maier, R.: Qualitat von Datenmodellen, Research Paper No. 15, Insitutefor Business Informatics, The Coblence School of CorporateManagement, Vallendar 1994
|111 Maier, R.: Benefit and Quality of Data Models - Results of an EmpiricalAnalysis, Research Paper, Insitute for Business Informatics, TheCoblence School of Corporate Management, in preparation, Vallendar1995
112) Maier, R., Lehner, F.: Towards a New Perception of Information, Dataand Knowledge - Implications of the German Management Literature,Research Paper, Insitute for Business Informatics, The Coblence Schoolof Corporate Management, Vallendar 1994
[ 13] R & O Software-Technik GmbH: Datenmodellierung in der Praxis - EineMarktanalyse uber die Anwendung einer Methodik, Germering 1992
[14| Scheer, A.-W.: Wirtschaftsinformatik - Informationssysteme imIndustriebetrieb: Ubungsbuch, Berlin et al. 1991
115] Schlageter, G., Stucky, W.: Datenbanksysteme: Konzepte und Modelle,Stuttgart 1983
[16] Sinz, E.J.: Datenmodellierung betrieblicher Probleme und ihreUnterstiitzung durch ein wissensbasiertes Entwicklungssystem,Habilitationsschrift, Regensburg 1987
[17] Zamperoni, A., Lohr-Richter, P.: Enhancing the Quality of ConceptualDatabase Specifications through Validation, in: Proceedings of the 12thInternational Conference on Entity-Relationship Approach, Dallas-Arlington, USA, December 15-17, 1993, 96-111
Transactions on Information and Communications Technologies vol 11, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517