on taxonomic reasoning in conceptual design

Upload: ana-carla

Post on 06-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    1/38

    On Taxonomic Reasoningin Conceptual DesignSONIA BERGAMASCHI and CLAUDIO SARTORIUniversity di Bologna, Italia

    Taxonomic reasoning is a typical task performed by many AI knowledge representation systems.In this paper, the effectiveness of taxonomic reasoning techniques as an active support toknowledge acquisition and conceptual schema design is shown. The idea developed is that byextending conceptual models with defined concepts and giving them rigorous logic semantics, itis possible to infer isa relationships between concepts on the basis of their descriptions. From atheoretical point of view, this approach makes it possible to give a formal definition forconsistency and mmimality of a conceptual schema. From a pragmatic point of view it is possibleto develop an active environment that allows automatic classification of a new concept in theright position of a given taxonomy, ensuring the consistency and minimality of a conceptualschema. A formalism that includes the data semantics of models giving prominence to typeconstructors (E/R, TAXIS, GALILEO) and algorithms for taxonomic inferences are presented:their soundness, completeness, and tractability properties are proved. Finally, an extendedformalism and taxonomic inference algorithms for models giving prominence to attributes(FDM, IFO) are given.Categories and Subject Descriptors: H.2. 1. [Database Management]: Logical Designdatamodels; schema and subschema; 1.2.4 [Artificial Intelligence]: Knowledge RepresentationFormalisms and Methodsrepresentation languages; frames and scriptsGeneral Terms: Design, Language, Theory, VerificationAdditional Key Words and Phrases: Schema consistency, schema minimality, semantic models,taxonomic reasoning

    1. INTRODUCTIONModeling an application domain in terms of conceptual models is at presentan important phase of many database design methodologies. This phase,called conceptual design, assumes that an a priori accurate user requirementcollection of both data and operations on data is performed; it gives, as aresult, a conceptual data schemaa formalization of the requirements [4,21]. Formalization is performed by the designer using conceptual models [4,19, 22, 33, 40, 50] that decrease the conceptual distance between the reality

    This work was partially supported by the Italian project Sistemi informatici e Calcolo Parallelo,subproject 5, objective LOGIDATA+ of the National Research Council (CNR).Authors address: CIOC-CNR, Viale Risorgimento 2-40136 Bologna, Italy; Tel: + 3951644.3550-3548; email: {sonia, claudio}@deis64 .cineca.it.Permission to copy without fee all or part of this material is granted provided that the copies arenot made or distributed for direct commercial advantage, the ACM copyright notice and the titleof the publication and its date appear, and notice is given that copying is by permission of theAssociation for Computing Machinery. To copy otherwise, or to republish, requires a fee and/orspecific permission.@ 1992 ACM 0362-5915/92/0900-0385 $01.50

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992, Pages 385-422.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    2/38

    386 . S. Bergamaschiand C, Sartorito be modeled and the primitives available, compared to traditional DBMSdata models. A conceptual data schema describes the in tensional knowledge,i.e., structures of concepts and interrelationships between concepts (relation-ships, part-of, isa hierarchies) [52, 53]. These ingredients are also found inwork on programming languages with type inheritance structures [20] and inthe most recent models for complex objects representation [1, 2, 35-38].Because of the difficulties facing a designer in conceptual design, variouslanguages and graphic tools (GALILEO [4], TAXIS [40], EASYER [31], ER-

    Designer [23], and expert systems [14, 24, 26]) have been developed to assistin conceptual schema construction. Despite the usefulness and efficiency ofthese tools, the decision on how to organize concepts in a taxonomy and,above all, the systematic verification of the schemas correctness is stillguided only by the designers experience.When we deal with complex application domains, we meet with a more

    difficult modeling task, since in this case the intentional knowledge is morecomplex and larger [18, 25]. It becomes necessary to think not only of atop-down or bottom-up construction of a schema, but of a true acquisition ofconcepts in any order [10, 27, 32]. Furthermore, the construction of a largeapplication schema must often be partitioned among various designers, eachdesign corresponding to a different user view, and these schemata have to beintegrated into a larger one with no loss of information consistency. It thenbecomes almost mandatory to have automatic design tools that more effec-tively support construction of large, complex conceptual schemata, checkingthe correctness and minimality of the schema.The problem of the systematic correctness verification of a schema can be

    faced with working environments that provide, alongside conceptual modelsemantic capabilities, suitable reasoning techniques. The organization ofconcepts in a taxonomy constitutes a basic modeling principle both in databaseand artificial intelligence areas. Thus, many efforts have been devoted totechniques that exploit isa relationships between concepts, as summarized inSection 8 [2, 3, 5, 6, 10, 27, 28, 32, 36].In the database area the taxonomy is built by the designer, since a concept

    must be described by an explicit declaration of its parent concepts (isa links)and its differen tiae properties. In the artificial intelligence area the taxon-omy is computed, as it is assumed that a concept description can be given asa composition of ancestor concepts (not necessarily parents) and differentialproperties.Automatic classification (i.e., determination of the right place for a concept

    in a taxonomy 1), is the most important reasoning task, called taxononzicreasoning, of knowledge representation hybrid systems (KL-TWO [54], KAN-DOR [44, 45], KRIPTON [15], BACK [39], CLASSIC [13], which are based onthe KL-ONE paradigm [17]). A taxonomic reasoner finds all isa relationships

    1Not to be confused with the classification abstraction mechanism of conceptual models, whichis devoted to instances.ACM TransactIons on Database Systems. Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    3/38

    On Taxonomic Reasoning in Conceptual Design . 387between a new concept description and the concepts taxonomy already given,by discovering the implicit isa links hidden in the concepts descriptions. Tomake these inferences it is necessary to deal with a definitional language, aswas first proposed by Schmolze and Israel [49], that is, a language in whicheach expression (i.e., concept) exactly identifies a set of items of a givenapplication domain. The correspondence between syntactic expressions andsets is thus obtained by assigning extensional semantics to each languagecomponent, i.e., the rules for the construction of the associated set. In thisway a subsumption relationship between two concepts can be computed bysyntactically comparing the language expressions describing them.zDefinitional language semantics is almost unheard of in conceptual model

    tradition, where a concept description is intended to represent conditionsnecessary only for the extension of a class (i.e., a class must be explicitly filledwith individuals whose descriptions satisfy these conditions) and isa linksbetween classes are explicitly declared.Hybrid-systems languages (called FDL = Frame Description Languages)

    represent both the semantics of conceptual models, by means of the so-calledprimitive concepts, and definitional languages semantics, by means of theso-called defined concepts.A defined concept gives a class definition, thus the description of the classrepresents necessary and sufficient conditions, and individuals can be auto-

    matically inserted in that class. A defined concept can be viewed as a derivedsubtype [34] where the structural specification (embedded in the definedconcept description) also defines the derivation rule to fill the correspondingclass. The primitive concept semantics captures the usual conceptual modelperspective by adding a further unknown condition to the concept description,which prevents automatic recognition of items, classes must therefore befilled explicitly. Furthermore, we observe that both primitive and definedconcepts semantics are useful because the first levels of a conceptual schemaare usually constituted by primitive concepts (no full sufficient and necessaryconditions are available), while deeper levels are constituted by definedconcepts.It is the authors opinion that taxonornic reasoning is a powerful technique,strongly enhanced by defined concepts, for supporting conceptual schema

    design. Furthermore, the relevance of this technique for other main topics indatabase research, such as recognition of instances and query validation aswell as optimization [8, 13], is outstanding, as briefly explained in Section 9.The question now is: Do we have to adopt FDL formalisms to describe

    conceptual data schemata, or can we still use the conceptual models whichare so popular in the database community?There are very good reasons to favor conceptual models: the E/R model

    constitutes a standard for the conceptual design of database applications andmakes automatic tools available for converting the conceptual schema to the

    2 The idea of subtyping relationships computation by means of syntactic comparisons is alsopresented by Cardelli [20] as the basis of a sound type-checking algorithm for a programminglanguage.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    4/38

    388 . S. Bergamaschiand C, Sartoridata logical schema of commercial DBMS (ReIGen [9], SchemaGen [23]).Thus, knowledge bases describing nontrivial application domains and pos-sessing very large extensional knowledge can be stored easily and managedon secondary memory by database technology. Again, E/R constitutes a datadescription standard for software engineering CASE tools, too. Other verypopular conceptual models and languages, such as TAXIS [40], DAPLEX [50],and GALILEO [4], are very important, as they constitute the basis of ongoingresearch on object-oriented databases; in particular, the DAPLEX functionalmodel constitutes the basis of the PROBE project for the development of anobject-oriented DBMS for the management of time and space information[25].The answer to the above question is that we can use conceptual models for

    knowledge representation and to perform taxonomic reasoning if we extendthe semantics of conceptual models with defined concepts, give them arigorous extensional semantics, and then develop a complete (i.e., everysubsumption relationship is detected) and tractable algorithm for the compu-tation of subsumption.The aim here is to propose a theoretical framework for conceptual schema

    acquisition and organization, preserving consistency and minimality. Theframework is based on taxonomic reasoning in strict, multiple inheritancetaxonomies.In particular, we introduce the formalism Y=* as a formal tool to support

    taxonomic reasoning and prove its feasibility. >%$?is a compositional formal-ism that expresses the defined and primitive concept semantics and makes itpossible to detect contradictory concepts (i.e., with empty extension) andcompute subsumption between concepts by means of syntactical type-check-ing rules.In this way it is possible to perform the passive consistency check of a

    schema, including primitive and defined concepts. Furthermore, as a com-plete subsumption algorithm allows computation of the minimal descriptionof a concept with respect to specialization ordering, the more active role ofbuilding a minimal concept taxonomy can be played.

    ,~, (the latter is an extension dealing with inverse roles)2%* and 5ZY*extend, respectively, the E/R and DAPLEX data semantics, modeling multi-ple inheritance from other entities as a part of the entity description andallowing defined entities. Y=* includes most of the data semantics of otherwell-known models that give prominence to type constructors such as TAXIS,GALILEO, as it allows representation of the aggregation, grouping, andgeneralization abstractions.3 ~z~~,, also includes most of the data semanticsof the models that give prominence to attributes, FDM, and the more recentIFO [2] for representing inverse roles. The representation of nested descrip-tions is a common feature of ~% and Y2~U and allows the representation of

    3The object-centered approach of H%2*makes it suitable for representation of the semantics ofobject-oriented data models such as 02, if cyclic definitions are excluded.ACM Transactions on Database Systems, Vol 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    5/38

    On Taxonomic Reasoning in Conceptual Design . 389nested functions. Furthermore, 5$2?* and >9~ extend AI FD-Ls, as theyinclude attributes and value domain semantics, following database tradition.The outline of the paper follows. A short background on FDLs and on the

    complexity of subsumption computation is given in Section 2. In Section 3,the Frame Description Language, SE%*, is introduced and its semanticsformally presented according to the model theoretic approach, and then asubsumption algorithm is given. A formal definition, based on subsumption ofconsistency and minimality of an Y%%* schema, and a classification algorithmare presented in Section 4. In Section 5, the E/R model is described by meansof&%%*, and the effectiveness of the techniques of Section 4 is shown by someexamples of conceptual schema design. In Section 6, ti9~ is presented, andthe extended subsumption algorithm is given. In Section 7, the DAPLEXmodel is described by means of 5Z5Z~U, and some examples of conceptualschema design are given. Section 8 examines related works on conceptualschema consistency checks. Section 9 examines related works on the applica-tion of taxonomic reasoning techniques to database querying and instantia-tion and gives some hints on the applicability of the results of this paper tothese topics.

    2. FDLS AND SUBSUMPTION COMPUTATIONA short background on frame description languages and on the complexity ofsubsumption computation is given in the following.FDLs have two major syntactic types derived from the epistemological

    primitives of KL-ONE: concept and role. The name of the language family,Frame Description Languages, derives historically from the correspondencebetween concept and role and the (typically less well-defined) notions offrame and slot of AI, respectively.

    In KL-ONE, the structure of a concept is described on the basis of moregeneral concepts (genus) and of a local structure (differential) constituted byadditional or differentiated roles. Roles describe relationships between in-stances of that concept and instances of others (role fillers). Concepts can beprimitive or defined. For instance, in KL-ONE, a world including persons,teachings, students, courses, and student enrollment can be described by thegraphical schema of Figure 1. Ellipses represent concepts; a primitive conceptis marked by an asterisk; arrows represent roles and are directed from aconcept to a role filler, which is the value restriction of the role; a role has aname and a number restriction ( rnirz, max ), which specifies the minimum andmaximum number of instances of the role filler in a particular role; thickarrows represent isa links.Person, string, and teaching are primitive concepts. Person has the role

    name with a unique value in the role filler string. Student is defined as aperson enrolled in a teaching (i.e., a person having the role enrolled-in whoserole filler is teaching); the number restriction (1, n) means that the studentmust be enrolled in at least one teaching, while no upper limit is given.Course is teaching with the additional role enroll and with at least one

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    6/38

    390 . S Bergamaschi and C. Sartorl

    na.e-~m4L ....Y 4person* teachingen>~~( student ~

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    7/38

    On Taxonomic Reasoning in Conceptual Design . 391

    5?57- describes defined concepts, concept hierarchies, and roles qualified bya value restriction and an existential quantifier, and is considered thecommon ancestor for these FDLs, as the subsumption function has beenproved to be complete and tractable. Its syntax follows:

    (concept ) : := (concept-atom)/( AIW(conceptl) . . . (concept. ))1( ALL(role)(concept ))1(SOiWE(role))

    ( role) ::= ( role-atom)The syntactic categories (concept-atom ) and (role-atom) indicate conceptsand roles for which no description is provided.The AND construct is a conjunction of concepts. For example,(AND male person)

    is the concept of someone who is, at the same time, male and a person (i.e., aman) and allows its superconcepts to be in the definition of a concept (that is,to represent multiple inheritance, in conceptual model terminology). In gen-eral, xisan(ANDcl. .. c,) iffxisa eland. ..andxis at..A role is an ordered couple of concepts. The ALL construct defines a

    concept on the basis of a value restriction on the fillers of a role. In general, xis an (ALL rc) iff each role filler of role r of x is a c. Thus, for example,

    (ALL enrolled-in teaching)corresponds to the concept of something allowing a role enrolled-in which, ifany filler exists, is a teaching. ALL is equivalent to the specification of apartial property in conceptual model terminology (a partial attribute or anentity component of a partial relationship in E/R terminology).The SOME construct guarantees that there will be at least one role filler

    for the named role (without any constraint on its type). Therefore, thecombination of ALL and SOME for a given role defines the totality of aproperty.3. 52% AND SUBSUMPTION COMPUTATION3.1. 3?%5?SyntaxWY extends 5Z7- with the concept-forming constructs NR, ALL., NR.,and NOT. In addition, 5?5?+ allows naming of concepts ( =), primitive con-cepts, and atom negation.The number restriction, NR, corresponds to the cardinality constraints on

    multivalued properties and on relationships in E\R terminology. The twoconstructs ALL. and NR ~ allow the description of attributes, i.e., rolesmapping into value sets. The construct NOT specifies a restricted form ofnegation, useful to state disjointedness constraints between concepts.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    8/38

    392 . S. Bergamaschi and C. SartoriThe language grammar follows:

    {termmology) ::= ( term-declaration, )... ( term-declaration. )( term-declaration) ::= ( concept-declaration) I

    { value-domain-declaration)(concept-declaration) ::= { def-declaration)~

    ( prim-declaration)( clef-declaration) ;:= {concept-name) = {concept)

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    9/38

    On Taxonomic Reasoning in Conceptual Design . 393The syntactic categories (integer-range ), (real-range), ( value-domain-

    range ) are range specifications, as in common pro~amming languages.5The syntactic categories (concept-name ), (concept-atom), (role-atom), ( at-

    tribute-atom), ( value-domain-name), and ( value-atom) are identifiers.The = statements allow an equivalence name to be assigned to a

    concept. For instance, the following statements assign the descriptions of aperson (primitive) and a student (defined):

    person = ( AND p( ALLU name string)(NRa name 1,1))

    student = (AND person( ALL enrolled-in teaching)(NR enrol led-in 1,n))

    Note that the fictitious atom p has been introduced in the description ofperson in order to represent the primitive nature of the concept, that is, pincludes the unknown conditions that would make the description of theconcept a definition. For each primitive concept, a new atom must be intro-duced, thus allowing us to distinguish between two primitive concepts withthe same type structure.GThe NR construct extends the SOME of >%5?- and guarantees that there

    will be at least min and at most max role fillers for the named role; ingeneral, x is in (NR r rein, max) iff x has at least min and at most maxdifferent fillers in role r (of any type). For instance, the above description ofstudent refers to a specialization of person having at least one item filling theenrolled-in role (the maximum is not defined and is indicated by n). Notethat the description of a property is given by two constructors: one for thequalification of fillers (ALL) and one for the quantification ( NR).The ALL. and NR~ constructs are similar to ALL and NR, but refer to

    attributes mapping into value sets. NOTHING denotes the empty set.The (NOT( concept-atom)) allows the specification of disjointedness be-

    tween classes described by primitive concepts. For instance, the followingdescriptions of person (extended with (NOT.. .)) and teaching specify that acommon specialization between person and teaching is not allowed, or inother words, this specialization is the empty set.

    person = (AND p(NOT t )( ALL. name string )(NRa name 1,1))

    teaching = ( AND t ( NOTp )( ALL. name string )(NR~ name 1,1))

    5Arbi trary enumerated subsets are not admitted.6 An ad hoc high-level language construct as prim could be easily added to %!?*, and thus anatom p could be automatically introduced by the system.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    10/38

    394 . S. Bergamaschi and C. Sartorl3.2 E% SemanticsA concept description may be associated to a set of items in an applicationdomain, a role description to a set of concept item couples, and an attributedescription to a set of couples (concept item, value). Therefore, an applicationdomain can be modeled by a set of concept, role, value domain, and attributedescriptions, which is called a terminology, and a mapping from the terminol-ogy to sets of items and couples of items of the domain (i.e., the powerset ofthe domain). More generally, if a terminology is to be seen as a first-ordertheory, it is possible to map the terminology to different domains and to thesame domain in different ways. Every mapping has characteristics that areoutlined by the semantics assigned to the language.This formalization technique is known as model theoretic approach [46].

    Following this approach, a family of functions called extension functions isdefined.~ The extension functions family defines the language semantics byassigning an extension to each concept, role, and attribute. This explicitdefinition of the semantics of each language expression allows the setsassociated to the concept descriptions to be compared in order to define asubsumption procedure and prove its correctness and completeness.Definition 1 (extension function). Let T be a terminology, that is, a set

    containing terms of SC%*; TC the set of concept specifications, TC~ andTC~ U {NOTHING} the sets of concept names and concept atoms mentioned inT; T, and T= the sets of role and attribute atoms mentioned in T, respec-tively;8 TU the set of value sets mentioned in T, including the Tz,~ and TV. setsof value domain names and value atoms, respectively. Let C and R c C X Cbe the sets of concepts and roles of the domain; let V and A c C X V be theset of value sets and attributes of the domain. Let ~ = C u R u V u A bethe domain and g be a function

    such that

    Va=Ta, %[a] q 2AVv c T,,, %[u] q 2

    7 This definition of the semmtics of a language is simdar to the formal set-theoretic semanticsfor types presented by Cardelli [20] and adopted in the data model 02.aObviously T, Q TCa u T,=.

    ACM TransactIons on Database Systems, Vol 17, No 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    11/38

    On Taxonomic Reasoning m Conceptual Design . 395% is an extension function over ~ with respect to T if and only if theextensions of the terms are as follows:

    %[(NOTC)] = C %[c] VC c TC. (1)%[ NOTHING] = 0 (2)

    i=~%[(ANDC1 ,.. .,cn)]= f)%[cz] VC, =TC (3)~=1

    %[(ALLrc)] = {Zeqby =C, (x>y) G%[r] -y C%[c]} (4)%[(NRrn-zin, rnax)] = {x ~lrnin < Illr(x)l < mcm} (5)

    with itl, (x)= {y =Cl(x, y) q%[r]]

    with ilf. (x)= {y =Vl(x, y) q%[a]}and the following constraints hold:

    Vcn~TC~,dc ~TC, (cn=c) ~T=%[cn] =%[c] (8)Wn q TU~, VVETu, (Vn=v) qTa%[vn] =%[v] (9)The function % univocally maps a value set u to the set of values %[ v],according to the rules of standard programming languages.The extension of any complex term can be univocally computed by the

    equalities from (1) to (7), starting from the extensions of the atomic terms.gConditions (8) and (9) constrain the extension of a concept name and a valuedomain name. Therefore, given a terminology T, where 9 is a domain and ~a function satisfying the above conditions, the interpretation I = (9, 8) isvalid, i.e., it is a model, in the model theoretic sense [46], of the knowledgeschema defined by T.As a last observation, we can give an extension equation for primitive

    concepts declarations: if en is a primitive concept name and en = (AND SC)its declaration, with s = TC. and c e T,, from 13q. (3) it follows that

    %[cn] =%[s] n%[c] C%[c].3.3 Subsumption ComputationThe adoption of an extensional semantics for the /9 language allows thefollowing formal definition of the subsumption relationship:Definition 2 ( subsumption). Given two concepts c and c,

    c subsumes c ~ VQ7 V%ouer&Z, %[c] C%[c].

    9The uniqueness is guaranteed by the acyclicity of the terminology as expressed in Section 3.1.

    ACM Transactions on Database Systems, Vol. 17,No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    12/38

    396 . S. Bergamaschi and C. SartoriThe main activity of taxonomic reasoning consists in determining whetherthe subsumption relationship holds for an arbitrary pair of concepts. In thefollowing we show how this computation can be performed by the booleanfunction SUBS on the basis of a syntactical comparison of the conceptdescriptions.

    Algorithm 1 (subs). SUBS is a function defined asSUBS: TCx Tc+ {true, fake}

    SUBS(t, u) ~ COMPARE(CANFORM(t) ,CANFORM(U))with COMPARE and CONFORM defined as below.

    The SUBS algorithm is a considerable extension of the one proposed byBrachman and Levesque [16], since it deals with concept names, attributes,number restrictions, atom negations, and contradictory concepts, and exploitstechniques introduced by Nebel [42].CANFORM transforms a concept declaration into a canonical form. The

    purpose of this transformation is to obtain a unique description (apart frompermutations) which is semantically equivalent to the original one and makessimpler the syntactical comparison of COMPARE. In particular, conceptnames are substituted by their descriptions; contradictory subexpressions arediscovered and substituted by NOTHING; unnecessary nestings are re-moved, and corresponding terms are substituted by a single term. Based onthe results of CANFORM, COMPARE structurally compares the resultingexpressions. The algorithm COMPDOM is used by COMPARE and detectssubsumption between value domains.As a first step, let us give the definitions of contradictory concept (to be

    denoted in the following as NOTHING).Definition 3 (contradictory concept). A concept, c, is contradictory with

    respect to a terminology T iffVL?J V~ over 9, %[C] = @.

    Algorithm CANFORM defines a set of rewriting rules applied to all subex-pressions of the description.Algorithm 2 (canform). CANFORM ISa function defined as

    CANFORM: TC+ TCCANFORM(C) = repeatedly apply the following rules

    Za A concept name is substituted by the corresponding description

    10 It has been observed by Nebel [43] that a concept description with m terms is transformed byname substitution into a description which has, in the worst case, t= 0(m d) terms, where d isthe maximum depth of the concept taxonomy. Therefore, name substitution generates aninherent intractability, because of an exponential growth if the order of magnitude of d is greaterthan log(n), where n is the number of concepts of the taxonomy. On the other hand, this lattercase is infrequent in real knowledge bases, and in the following we assume that name substitu-tion does not generate exponential complexi ty.ACM TransactIons on Database Systems, Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    13/38

    On Taxonomic Reasoning In Conceptual Design . 3972b The associative property of the AND operation is used in order to remove thenested ANDs from the function arguments.

    (AND... (ANC,.. C,... C,)... C,+ CAN)... C,... C,...)2C If an AND expression contains ALL (ALL,) terms referring to the same role(attribute), they are replaced by a single term, with the concepts (value domains)grouped by AND:

    (AND . . . (AU.r(ANDcl . .. C.)) . ..(ALLr(AND c.+l... en))...)~ (ANi3... (ALLr(AND cc, c,+lcc~) ).. c~))... )11

    2d If an AND expression contains two terms like (ALL, a v) and (NR, a rein, max)and v is a finite countable value domain with cardinality m, with m s max, thenmax is replaced by m:(AND . . . (ALL, a v)(NR. amin, max) . ..)

    ++ (AND... (ALLaav)(NR, amin, m)...)2e If an AND expression contains an atom and its negation, then the AND expres-sion is replaced by NOTHING:

    (AN13... c... (NOTC) ,. .) ~ NOTH/NG2f If an AND expression contains NOTH/NG, then the AND expression is replacedby NOTHING:

    (AND . . . NOTHING . . . ) ~ NOTHING2g If an AND expression contains at least two different value domain names or twovalue domain ranges referring to different value domain names, then the ANDexpression is replaced by NOTH/NG:

    (AND. . . V,... V)+ NOTHINGHING2h If an AND expression contains at least two value domain ranges referring to thesame value domain name, but with an empty intersection (%[vII n %[vJ = 0),then the AND expression is replaced by NOTHING:

    (AND... vi.. . V2) - NOTHING2i If an ALL (ALL.) term has NOTH/NG as a filler, then it is replaced by an NR

    (NR=) term with min = max = O:(ALL r NOTH/NG) + (NRrO, O)2j The NR (NR~) terms referring to the same role (attribute) are replaced by a singleterm having as its number restriction the intersection of the original intervals:

    (AND . . . (NRrminl, maxi) . . . (NRrmin2, max2) . ..)_ (AND . . . (NRrmax(minl, min2), min(max7, max2)) . . . )

    2k The ALL (ALL.) terms with an NR (Nf?.) term containing the same role andhaving min = rnax = O are eiiminated:i2(AND . . . (ALL r... )( NffrO, O)) ~ (AND . . . (NRrO, O))

    11All value restrictions which are not AND expressions are converted into AND expressions:(ALL m) ++ (AU r(AND c)) and (.ALLa ..) * (ALL. a(AND u)).12 In fact, %[( A ND(ALL ry)(iVRrO, O)] = %[( NRrO, O)] Vy G C. In other words, (NRrO, O) de-notes the extreme case reachable by restricting the set of fillers for a given role.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    14/38

    398 . S. Bergamaschi and C, Sartori21 If an NR (NR~) term contains conflicting number restrictions, min > max, it isreplaced by NOTHING:

    (NRr min, max) - NOTHING,THEOREM 1. Given a concept c, the canonical form generated by Algorithm

    2 is extension preserving:V9, V% 8[C] =%[CAIWORM(C)].PROOF SKETCH. The proof follows from the application of the definition of

    the extension function % given by Equations from (1) to (9). uCOMPARE structurally compares the expressions given by CANFORM andmakes use of the function COMPDOM to detect set inclusion between valuedomains.

    Algorithm 3 (compdom). COMPDOM ISa function defined asCOMPDOM: TVx TV+ {true, fake}

    COMPDOM(X, y) = true iff:3a (y= NOTH/NG)v3b (x is a value domain name or a value domam range)= (Y is a value domain range) A %[yl c %[x I).Algorithm 4 (compare). Given a couple of concepts [n canonical form, c and c,COMPARE(C, c) = true iff4a

    4b4C

    4d

    4e

    4f

    ~ = NOTHING)C = (AND c;...:;), C = (AND C: . ..c.r),[Vc; , i= 1,. ...(c; IS an atom o! a negated atom) = (3c~ = c;)A(c; = (NR r rein, max)) =+ (3c; = (NR r rein, mix) A rein 2 rein A rnax smax)A$~m=~~.a mint, fWX)) ~ (3c; = (NR,a rein, max) A mint 2 rein A maxA(c: = (ALL r x)) - ((3c; = (ALL r y) A COMPARE(X, y) = true) v Efc; =(NR ro, 0)))13A(c; = (ALLaa x)) - ((~c; = (ALL.a y) A COMPDOM(X, y) = true) v Elc; =(Nfl.a O,0)))1

    THEOREM 2 (subsumption computation). Given two concepts c and c,c subsumes c - SUBS(C, c )

    PROOF. It has to be shown that SUBS is equivalent to the subsumptiontest for any couple of concept descriptions, that is,SUBS(C, c ) = true ~ c subsumes c ~S[c] q%[c] V&7 V%over Q

    13 In fact, %[(NRr O, O)] G% [( ALL ry)] Vy e C.ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    15/38

    On Taxonomic Reasoning in Conceptual Design . 399This equivalence must hold for every possible extension function associatingindividuals to concept descriptions. By virtue of Definition 2 and Theorem 1,this is equivalent to proving soundness and completeness of function COM-PARE, then it will be shown that COMPARE is sound:

    COMPARE(C, c) = true * %[c ] c %[c] VSZ V% over ~and complete

    %[c ] c %[c] VS2 V% over 9 + (COMPARE(C, c ) = true)and its complexity is polynomial.Soundness. Given a couple of concepts c and c, in canonical form,

    COMPARE(C, c ) = true * c subsumes cCOMPARE(C, c ) = true implies that either c is NOTHING (case 4a ofAlgorithm 4) or for each c; there exists a c;. such that the appropriate case(from 4b to 4f) of Algorithm 4 is verified (i.e., the consequent of the implica-tion is true). It will then be shown that

    Case 4b is trivial. If cases 4c, 4d, or 4f are verified, by Eqs. (5), (7), and (6) ofSection 3.2, it results that

    %[(NRrmin, max)] g%[(NRrmin, max)]%[(AIR~a rein , max)] g%[(NR~amin, max)]

    i%[(A.LL~a v)] C%[(ALL~a v)].V13 V% ouer $Z if (rein > rein A max < max) or COMPDOM(V, v ) =true, respectively.Having proved the cases without recursive calls to COMPARE, the induc-

    tion principle can be applied to prove case 4e. Let us suppose thatCOMPARE(x, y) = true +%[y] s2?[x]

    then, by virtue of Eq. (4):2Y[(ALLry)] ~%[(ALLrx)], V&2 V%ouer.$Z

    and then

    and

    In any case, by virtue of Eq. (3), the following relation holds:[11CUI8-c;=naq]pm;],jJ j[1cf] C% nc; = %[c] V9 V&Z70uer13 ui

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    16/38

    400 . S. Bergamaschi and C. SartonCompleteness. Given a couple of concepts c and c,

    c subsumes c = COMPARE(C, c ) = trueThe implication above is equivalent to

    COMPARE( c, c ) = false * not (c subsumes c )that is, by Definition 2:

    COMPARE(C, C) = false = 357 =Souer=l%[ c] z %[c].Function COMPARE(C, c ) can fail because of any of the cases from 4b to

    4f of Algorithm 4 (c is assumed different from IVOTHIIVG). In each of thecases, it has to be chosen adequate ~ and % in order to obtain %[ c ] z %[c].The full completeness proof is given in [12] by providing an extensionassignment algorithm that guarantees the above property. u

    THEOREM 3 (complexity of canonical form generation). The time necessaryto reduce a concept description to canonical form is 0(z2 ), where ii is thelength of the description, i.e., the number of terms it contains after namesubstitutions, including all terms in the nested definitions, if any.

    PROOF. The rewriting of rule 2b can be performed in linear time. Rules 2j,2c, and 2k require, for each term, examination of all the following terms atthe outermost level in the description; then they can be performed in timeO(7i2) or less. u

    THEOREM 4 (complexity of COMPARE). The time necessary to computeCOMPARE( c, c ) is an 0( 7i2 ), where li is the length of the longest argumentin COMPARE.

    PROOF. For each term of a description, Algorithm 1 requires examinationof all the terms of the other description, including recursive calls to COM-PARE itself. In the absence of recursive calls of COMPARE, the complexity ofthe computation is 0(Fi2 ). The recursive calls do not increase complexity; infact, say, nh, k = 1, ..., r, the length of one of the r nested terms (introducedby an (ALL.. . )), the complexity of a recursive call is O(fi~ ), but since thefollowing relation holds:

    r7i>~nk

    k=l

    the global complexity of COMPARE with recursive calls is 0(772) or less. uThe subsumption algorithm allows us to give a syntactical characterization

    of some relevant properties of concepts.THEOREM 5. Given a terminology T and two concepts c and c:

    c is contradictory if and only if SUB S(NOTHING, c) = true, that is, if andonly if CANFORM(C) = NOTHING.

    ACM TransactIons on Database Systems, Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    17/38

    On Taxonomic Reasoning in Conceptual Design . 401c and c are equivalent (c = c) if and only if SUBS(C, c) = true and

    SUBS(C, c) = true;c and c are disjoint if and only if SUBS(NOTHING, ( AND cc )) = true.

    PROOF SKETCH. The results immediately follow from the proof of correct-ness and completeness of algorithm SUBS. u4. CONSISTENCY AND MINIMALITY OF AN 55?* SCHEMAA major problem of conceptual design is the generation of a schema that isconsistent and does not contain redundancies. In this section we show the useof taxonomic reasoning in conceptual design by giving a formal definition ofthe consistency and minimality of an Y?* schema (i.e., a terminology). Aclassification algorithm is then introduced to assist schema acquisition,preserving consistency and minimality.An 5zZ* concept is described as a conjunction of ancestors (not necessarily

    parents) and local properties. A contradiction can arise either through aconflict between the local properties, a local property and an inherited one, orbetween the inherited properties. Algorithm CANFORM transforms suchconflicting descriptions into NOTHING.Furthermore, we say that, given a terminology T, a concept description is

    minimal if it contains only parent names and local properties.The set of all the ancestor concepts of a concept c, called generalizations

    set, is computed by using the SUBS algorithm, as follows:GS(c)={t,, i= l,..., kl(t, = TCn A SUBS(tZ, c) = true))

    The minimal description of a concept c, with respect to a terminology T, isobtained by rewriting the (user-supplied) concept description, as follows:

    MINDESC : TC + T,MINDESC(C) = (~~C1.. .Ch ICD), {Cl.. .C~} = MSGS(C)

    where the most specialized generalization set MSGS(C) is obtained by GS(C):MSGS(C) = {ci, i = 1,. ..,l(c, c, q GS(C) A

    Pc G GS(C)ISUBS(C,, c) = true)} (10)and the difference concept CD, if any, is given by the differences of c withrespect to its parents, and is obtained by a deterministic process as shownbelow. Obviously, if the difference concept is not an empty description, noconcept c @ TC. such that SUBS(C, CD) = true exists.In the following we show that the minimal description of an 5?9 concept

    can always be expressed in 92Z* and is unique.THEOREM 6 (uniqueness of MSGS(C)). For any concept c expressed in %5?

    the most specialized generalization set MSGS(C), with respect to a giventaxonomy, is unique.ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    18/38

    402 . S. Bergamaschl and C. SartoriPROOF. Let MSGS(C) = {a,, i = 1, . . . . m} and MSGS(C) = {bJ, j =

    1,. ... n} be two distinct most specific generalization sets, and let us supposeMSGS(C) MSGS(C) = {a.}. From Eq. (10), it follows that(a. @ MSGS(C)) ~ (ax G GS(C)) s @ = GS(c]l(SUBS(a*, 6*) = true))but

    ax = MSGS(C) = @g = GS(c)l(SUBS(a*, g) = true))and this gives rise to a contradiction. uThe difference concept cD for a noncontradictory concept c, with respect to

    a terminology T, can be expressed in >%%*, and is computed through thefollowing deterministic process.

    Algorithm 5 (CD computation). Let c be a concept in canonical form and let usintroduce the following shorthand notation: (nc, ) = (AND. c, . ). Concept c canbe rewritten as follows:

    c=[~(fiP)(fi(NRrnmn,m~,) )(ri(AL~fl,cf,,)

    (ma

    1(nfa

    n (fW,an,arnin,, arnax,) (_) (ALL, af,v,),=1 ,=1 1)where p, ISeither an atom or the negation of an atom, r n, and rf, are role names, cf,are concept descnptlons, an, and af, are attribute names, and v, are value domaindescriptions,Analogously, the genus concept of the minimal description of c (briefly denoted byCG), has the following descnphon:

    CG = (ANDc1 . .. Ck). C, G MSGS(C)and can be reduced to the canonical form and rewritten as follows:

    [nfaG

    )(nfaG

    n (A~~a af,v,G) n (AU.. afjv,G),=1 ,=1 ))with Vi, j: rf, # ti, and af, + af,. By virtue of elementary set properties, SUBS(CG, c)= true hoIds, and then, from the definition of subsumption, it follows that:np > npG. All atoms and negations in CG must be in C;

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    19/38

    On Taxonomic Reasoning in Conceptual Design . 403nn>nn Gand V/= l,... ,nn G ((rnin, 2 min~) A (ITEM,s mx~)). All ~~ in CGmust be in c, where they may have a restricted range;nf > nfG and Vi = 1, ..., nfG (SUBS(cf,G, cf,) = true);Vj=l,.. .,nG(3i=l, l,..., nnl((~, = m,) A (rnin, = max, = O))). Some ALL termsin CG do not have a corresponding term in c because of rule 2k of algorkhmCANFORM; the survivors terms may have a restricted role filler;nna > nnaG and Vi = 1, ..., nnaG ((amin, > amin~) A (amax, s amax,G)). All NR,in CG must be in c, where they may have a restricted range;nfa > nfaG and Vi = 1, ..., nfaG (COMPDOM(V,G, V,) = true);Vj=l,.. .,nfG(3i=l, l,..., nnal((af, = an,) A (amin, = amax, = O))). Some ALL.terms in CG do not have a corresponding term in c, because of rule 2k ofalgorithm CANFORM; the survivors terms may have a restricted value domain.The difference concept contains the terms in c that are not contained in CG or thathave been modified with respect to c, ekher for a more restricted range or a morespecialized role filler in the same role. The difference concept description is thefollowing:

    cD=fl ( ](,=]G+IP, n (NRrn, rein,, max,)]( fl (ALL ff,cf,))1=/ /=/,

    ( )((NR.an, amin,, amax, ) (1 (ALL. af,v, )1EIan IEI,, ))with In = {i = l,..., nnGl((min, > mirr~) v (max, < max~))}u{i=nn G+l, . . ..nn}.

    /f={i=l,..., nfGl(cf,G #cf, )} U{i=nfG +l,... jnf},l~fl={i= l,..., nnaG1((amin, > amin~) v (amax, < amax,G ))}

    u{i=nnaG +1, . . ..nna}/~f={i=l,..., nfaGl(v,G+v, )}u{i =nfaG+l,..., nfa}

    From Definition 10, Algorithm 5, and Theorem 5, it follows that theminimal description of a concept c, with respect to a terminology T, isextension preserving, that is

    V&Z V%over9 (%[c] = %[MINDESC(C)] = %[cG] n %[c~]).In particular, it can be observed that CANFORM(C) = CANFORM(MINDESC(C)).Definition 4. (Consistency and minimality). A terminology T is consistent

    if it does not contain contradictory concepts, and is minimal if every conceptdescription is minimal.We are now able to give the high-level description of a classification

    algorithm which can be the basis of a system supporting the acquisition of an52%* schema and maintaining consistency and minimality. This algorithmexploits the canonical form generation and subsumption algorithms pre-sented in Section 3.3 and applies to the above definitions for consistency andminimality.

    ACM Transactions on Database Systems, Vol. 17, No 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    20/38

    404 . S Bergamaschi and C. SartoriAlgorithm 6 (CLASSIFY). Given a consistent and minimal terminology T and a newconcept c expressed in $Z?*:6,1 If SUBS(rVOTHING, c) then reJect c, exit,6.2 Add the minimal description of c to the terminology T:= T U {c= MlNDESC(c)}6.3 If there does not exist in T any concept equwalent to c, then restructure theterminology by determining the new minimal descriphons for concepts sub-sumed by c:

    if pc = Tcnlc* = cthen Vc = TISUBS(C, c) = truecfo T:=(T -{c=.. .}) u {c = MINDESC(C)}The classification of a concept is therefore strongly based on subsumptionevaluation. This process requires 0(t) subsume operations, where t is thenumber of concepts in the taxonomy. The minimal description determinationrequires O(k 2) subsume operations, where k is the number of subsumers ofthe new concept.5. FORMALIZATION OF E / R IN 55?Up to now we have shown a fairly general language able to describe concepts.We will now show how 3ZY is able to describe the modeling constructs of theE/R model, including some well-accepted extensions such as generalization,multiple and compound attributes, and cardinality ratios, as described byBatini et al. [7]. In Appendix A the syntax of the extended E\R schemadefinition langaage drawn from Batini et al. [7] is shown. The application of5Z$F* to E/R allows, as a fundamental extension, the description of definedentities. Moreover, the generalizations of an entity and its participation inrelationships are directly included in the description of the entity itself. Thismakes the definition of an E/R schema in some way entity centered; that is,most of the schema is described as a characteristic of an entity. This ap-proach is lined up with most semantic models and the recent object-orientedmodels [1, 2, 4, 38, 40]. The above extensions allow exploiting of taxonomicreasoning techniques, as will be shown later. The description of an entity isas follows:

    ( entity-declaration) ::= ( prim-entity-declaration) ~( def-entity-declaration)

    ( prim-entity-declaration) ::= (entity-name) = ( AND(entity-atom )(entity) )( clef-entity-declaration) ::= ( entity-name) = (entity )

    (entity) ::= (entity-component ) I( AVD{ entity-component ~) . . .

    {entity-component. ) )( entity-component) ::= ( attribute)l

    ( cornpound-attribut e)l(generalization) I(exclusion )1(relationship-side)

    ACM TransactIons on Database Systems. Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    21/38

    On Taxonomic Reasoning in Conceptual Design . 405(attribute) ::= (AND( ALL~(attribute-atom)( value-set ))

    (NR~(attribute-atom) (rein), (max)))(compound-attribute) ::= (AND( ALL(attribute-atom) (compound))

    (NR(attribute-atom) (rein), (max)))(compound) ::= (~D( attribute,) . . . (attribute))

    (generalization) ::= (entity-name )l(entity-atom)(exclusion) ::= (NOT(enti ty-atom))

    ( relationship-side) ::= (ALL( role)( relationship-name ))l(NR(role)(min), (max))

    It can be seen that an entity can be described as primitive or defined.Furthermore, as an addition to the traditional E/R, an entity can be charac-terized as a conjunction of entities ((generalization) ) which model multipleinheritance and, by means of the syntactic category (relationship-side ), onthe basis of participation in a relationship, with a specific role. The syntacticcategory (relationship-side ) represents one side of a relationship.The description of a relationship is then reduced to the description of its

    attributes, with the following syntax:( relationship-declaration) ::= (relationship-name) = ( relationship )

    ( relationship) ::= (relationship-atom)l(attribute) I(compound-attribute) I(AND(attributel) . . . (attribute.))

    Since the semantics is mainly represented on entities, relationship hierar-chies are not allowed. Cardinalities and value sets are defined as in the Y?*syntax of Section 3.1.Finally, the constraint of exclusive hierarchy with respect to the primitive

    entities el, . . . . en, sons of e and characterized by the atoms eal, . . . . ea~, canbe represented with negation components as follows:

    e, = (ANDe(NOTea2) . ..(NOTe). )...). .en = (ANDe(NOTeal) . ..(NOTe_l) l)...)

    By comparing the extended E/R syntax of Appendix A with the applicationof YZ%* to E/R, it can be observed that some aspects cannot be captured: thenotion of total generalization hierarchy (coveragel) and identifiers. Therepresentation of these aspects requires an exension of the language whichwould make a complete subsumption algorithm intractable. The incapabilityof representing total hierarchies is really a limit and is related to thecomputational complexity introduced by dealing with both conjunction anddisjunction constructors [48]. Instead, the distinction between an attributeand an identifier attribute, even if important at constraint representationlevel, seems to have no significant influence on schema consistency andminimalit y.

    ACM TransactIons on Database Systems, Vol. 17.No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    22/38

    406 . S. Bergamaschi and C. Sat-tori

    name nameperson* teaching*A enroll-year A

    enrolled-inz /!

    enrollstudent 1,n 1 I,n course

    enrollmentFig. 2. E/R schema example.

    The capabilities of taxonomic reasoning techniques in conceptual schemadesign are shown by a simple schema acquisition. The schema will beenriched by introducing one concept at a time, and the effectiveness of theCLASSIFY algorithm will be pointed out. In Example 5.1 the simple schemaof Figure 2 is translated into 5??. In Example 5.2, the CLASSIFY algorithmis executed, consistency is checked, and a minimal schema is produced. InExample 5.3, a nonminimal concept description is transformed into a mini-mal one, and some comments on minimality with respect to primitive anddefined entities are given. In Example 5.4, a case of contradictory conceptdetection is explained.Example 5.1 (E/El schema description). Person is a primitive entity,

    disjoint from teaching and with the single-valued attribute name. Thus, it isdescribed by

    person = (AND p(NOT t )( ALL= name string )(NR~ name 1,1))The relationship enrollment is descried by

    enrollment = ( AND( ALL. enroll-year integer )( NR~ enroll-year 1,1))Student is a defined entity described not only with a data structure but alsoby its participation in the relationship enrollment:

    student = ( AND person( ALL enrolled-in enrollment )(NR enrolled-in 1,n ))

    Teaching and course are described as follows:teaching = ( AND t(NOTp)( ALL. name string )(NRa name 1,1))course = ( AND teaching

    ( ALL enroll enrollment )(NR enroll 1, n))The relationship enrollment is therefore described through its explicit intro-duction, and the above implicit descriptions of the two relationship sides,enroll and enrolled-in, which are given in the description of course andstudent respectively.ACM Transact ions on Database Systems, Vol. 17, No 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    23/38

    On Taxonomic Reasoning in Conceptual Design o 407

    1person 1Igraci-student*

    enrollment enrollmentFig. 3. E/R schema before classi fication.

    Example 5.2 (E/R schema acquisition). For each of the above entities, theCLASSIFY algorithm is executed. First of all, in step 6.1 the canonical formis generated by the CANFORM algorithm. Person and teaching are leftuntouched, while student and course are transformed by rule 2a of Algorithm2, as follows:

    student = ( ANDp(NOT t)(ALL~ name string)(NRa name 1,1)(ALL enrolled-in enrollment )( NR enrolled-in 1,1))

    course = ( AND t (NOTp )( ALL. name string )(NRa name 1,1)(ALL enroll enrollment )( NR enroll 1,n))

    Since none of the above entities is NOTHING, step 6.2 is executed for eachentity. In particular, algorithm SUBS finds that no subsumption holds be-tween person and teaching, since, even if they share a common attribute(name), they have different atoms (p and t). Note that the disjunctionconstraint does not affect this result. Then the schema described so far isconsistent and minimal.Example 5.3 ( Minimality). The new primitive entity grad-student, which

    is a person enrolled in an enrollment, is to be added to the schema.grad-student = ( ANll gs person

    (ALL enrolled-in enrollment )(NR enrolled-in 1, 1))The above description corresponds to the E/l% schema fragment of Figure 3.The canonical form of grad-student is tlhe following:

    grad-student = ( AND gs p(NOT t )( ALL. name string )(NRa name 1,1)

    (ALL enrolled-in enrollment)(NR enrolled-in 1,1))Algorithm 6 computes SUBS(student, grad-student ) = true. Then the de-scription of grad-student is modified to the following minimal form:

    grad-student = (AND gs student ]corresponding to the E/R schema fragment of Figure 4.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    24/38

    408 . S. Bergamaschi and C. Sartorl merson*ii

    r \Fig. 4. E/R schema after classification. \\student / enrollment\ dII!!5rad-student*It is worth noting that it has been possible to classify the primitive entity

    grad-student as a specialization of the defined entity student. Furthermore, ifgrad-student would have been introduced as a defined entity, it would havebeen recognized as equivalent to student (the atom gs would not be in thedescription) and taken as a synonym (see Algorithm 6, step 6.2).Example 5.4 (Contradictory concept). The following example illustrates

    how inconsistencies on entity descriptions due to wrong role number restric-tions are detected.Let us now add the description of a last-year-student, which is defined as a

    student who passed at least 25 and at most 30 units:last-year-student = ( AND student (NR passed-units 25, 30))

    The addition to the schema of the following description of a last-year-bad-stu-dent who passed not more than 20 units, introduces the number restrictioninconsistency:

    last-year-bad-student = ( AND last-year-student (NR passed-units 1, 20))In this case, algorithm CANFORM produces the following canonical

    last-year-bad-student = ( AND p( N(3T t )( ALL. name string )(NR~ name 1,1)

    ( ALL enrolled-in enrollment)( NR enrolled-in 1,1)( NR passed-units 25, 20))

    form:

    The number restriction of role passed-units, computed by rule 2j of Algo-rithm 2, makes the above definition contradictory on the basis of rules 21 and2e. Therefore the entity is rejected by step 6.1 of Algorithm 6.6. ~~;., AND SUBSUMPTION COMPUTATIONYy; extends @5f* with the role-forming construct INV, which allows repre-sentation of the notion of inuerse role. INV has been introduced to deal withACM Transact ions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    25/38

    On Taxonomic Reasoning in Conceptual Design . 409

    1,t

    butring 11name d name9C%;,,schema example.

    the inverse function of functional models as DAPLEX and IFO. The SZ5?syntax presented in Section 3.1 is extended as follows:(role) ::= (role-atom)l

    ( INV( role-atom))(WV r) introduces a role whose extension is the mirror image of the role r.The semantics of inverse roles is defined by completing the extension

    function of Section 3.2 as follows:Z$-[(IIVVr)] = {(x, y)l(y, x) =%[r]} (11)

    The example of Figure 1 can be refined by substituting the arbitrary roleenroll with the inverse of enrolled-in, as shown in Figurecan be described by

    course = (AND teaching (NR( INV enrolled-in)l, n)( ALL(INV enrolled-in) student ))

    It is worth noting that inverse roles make available a

    5. Therefore, course

    new mechanism forthe definition of ~oncepts: a specialization of a concept (course) can becharacterized on the basis of being the role filler of a given role (enrolled-in)for a given concept (student).The presence in the same concept description of a role, together with its

    inverse, gives rise to new powerful subsumption inferences. Let us considerthe following description:

    ( AND(NR enrolled-in 1,n)( ALL enrolled-in( AND(NR( INVenrolled-in)l, 1)

    (ALL(INV enrolled-in) student ))))ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    26/38

    410 . S. Bergamaschl and C. SartoriThe semantics of inverse roles implies that the anonymous concept above issubsumed by student. In fact, the above description individuates all the itemsof the domain which are enrolled only in something which enrolls exactly onestudent.The new semantics of inverse roles makes it necessary to extend the SUBS,

    CANFORM, and COMPARE algorithms, giving rise to the SUBSINV, CAN-FORMINV, and COMPAREINV algorithms, respectively, for the subsump-tion computation in a terminology Tn, which includes inverse roles.

    Algorithm 7 (SUBSINV). SUBSINV ISa function defined asSUBSINV: T;v x T~ ~ {true, fake}SUBSINV(t, u) ~ COMPAREINV(CANFORMINV(t) , CANFORMINV(U))

    with COMPAREINV and CANFORMINV defined as belowAlgorithm 8 (CANFORMINV), CANFORMINV is defined as

    CAN FORMINV, T;n - T;nv8a. . . 81rules from 2a to 21of Algorithm CANFORM148m When the role filler of a role r IS d efined on the basis of the Inverse of the roleitself, say /NV r, and there IS at least one roll filler value, It is possible to simplifythe concept description In accordance with the followlng rule:

    (AND . . . (ALL r(ANEJ . . . (ALL(/NV~)C)(NFf(lNVr)l,l )))

    (Nffrrn, n)) wlthrn> 1*

    (AND... c(ALLr(AN( NR(lNVr)l,I))) I)))(Nffrrn, rr))15

    8n If an AND expression (AND c1 . . . CP) contains a term in which the role filler of arole r E defined on the basis of the Inverse of the role itself, such as CP= (ALLr(AND . . . (ALL(hVV r)x))), and with at least one role filler value, such asc ~.l =( NRrrn, rr) with m >1, and the followlng condlhon holds: (ANDC1. ..c ~. 2X) = NOTHING, then(AND C, . . . c P) ~ NOTHING

    14All rules appliable to a role r are appliable to a role INV r.lS The same rule holds if r is the inverse of a role s (r = (INV s) and (LVV r) = s). Thetransformation could have been applied equally well with ( NR( INV r )1,)));his was not donebecause this generalization is useless for algor,thm CANFORMINV.

    ACM Transact Ions on Database Systems, Vol. 17, No 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    27/38

    On Taxonomic Reasoning in Conceptual Design . 41180 When the role filler of a role is defined on the basis of a number restriction onthe inverse of the role itself with min = max = O, the ALL term is replaced byan NR term with min = ma = O.

    (ALL r(NR(hVVr)13,0)) H (M/r 0,0)Algorithm COMPAREINV extends COMPARE by adding the specialized

    inferences for inverse roles:Algorithm 9 (COMPAREINV). Given a couple of concepts in canonical form, COM-PAREINV(C, c) = true iff:9a (c = NOTHING)vc=(ANDc I = (AND C; . ..c.),

    [Vc;, i= l,.. j;j~)) c9b... 9f logical expressions from 4b to 4f of Algorlthm COMPAREv9g 3c; = (AU r(AND . . . (ALf. (/NV r)x))) A2c; =( NRrm, n) Am21 ACOMPAREINV(cj, x) = true].

    7. FORMALIZATION OF DAPLEX IN 5?5?; ,,DAPLEX [50] is a data definition and manipulation language for databasesystems based on the functional data model, which was first introduced bySibley and Kershberg [51]. The two main constructs of DAPLEX are theentity and the function, which model conceptual objects and their properties.The functional data model, on the surface, is very similar to the KL-ONEmodel if we compare entities to concepts and functions to roles. Therefore, aDAPLEX schema can easily be drawn with ellipses and arrows as in Figure 1.On the other hand, in DAPLEX the functions are not used as rules to fillentity classes, but constitute only integrity constraints. Therefore, withDAPLEX it is possible to model the schema of Figure 2 as follows:

    DECLARE person( ) * > ENTITYDECLARE name( person) = STRINGDECLARE student( ) * > personDECLARE teaching( ) * + ENTITYDECLARE course( ) * > teachingDECLARE name( teaching) * STRINGDECLARE enrolled-in( student) w > courseDEFINE enroll (course ) * > INVERSE-OF enrolled-in( student )

    Note that an immediate translation from entities and functions of the aboveschema to YY,;,, concepts and roles is not possible, as it is necessary toprevent the definitional cycle between student and course. The best possiblesolution in >9,~,, is the one already shown in Figure 5.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    28/38

    412 . S. Bergamaschl and C. SartoriA DAPLEX schema can be described by an &* schema constrained as

    follows:(schema) ::= (entity -declarationl ) . . . (entity-declaration. )

    (entity-declaration) ::= ( prim-entity -declaration)l( clef-entity-declaration)

    ( prim-entity-declaration) ::= ( entity-name) = ( AND( entity-atom)( function-declaration))

    ( clef-entity-declaration) ::= ( entity-name) = ( function-declaration)( function-declaration) ::= ( function-decl )1

    ( function-defn)l(entity-name ) I(exclusion)l(entity -tuple) I( AND( function-declarationi )...

    ( function-declaration.))( function-decl ) ::= ( attr-function) I

    (attr-function) (attr-predicate) I(role-function) I( role- function)( role-predicate )

    (attr-function) ::= ( ALLa(attr-name)( ualue-set ))(role-function) ::= ( ALL( role)( function-declaration ))(entity -tuple) ::= ( AND( ALL( sidel )(entity-namel ))

    (AVi(sidel)l,l) . . .( ALL( side. )(entity-namez))(NR side. 1,1))

    (attr-predicate) ::= ( NR.(attr-name)( rein), ( max))(role-predicate) ::= ( NR(role)( min ), (max ))( function-defn) ::= ( AND( entity-namel )

    ( ALL(IIW( role-name ))(entity-name, )))(exclusion) ::= (NOT entity-atom)

    ( roze} ::= (role-atom)l( INV( role-atom))

    The formalization of DAPLEX in Y5Z;U presents two kinds of problems: thefirst is the need to allow definitional semantics for entity descriptions; thesecond is more practical, since in DAPLEX the description of an entity isACM TransactIons on Database Systems, Vol 17, No 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    29/38

    On Taxonomic Reasoning in Conceptual Design . 413spread over many declarations and definitions, while Y2~U requires thestructure of an entity to be expressed by a single statement.lG To solve thislast problem, the function descriptions for a given entity are gathered by anAND constructor. Note that the syntactic category ( role-function) allows thedescription of nested functions, which represents an extension with respect toDAPLEX, but is also an important feature supported by the more recent IFOand easily expressed by J9~.The description of an entity is taken as defined. If a primitive description isrequired, the distinction is obtained by adding an entity atom to the ANDcombination of the components.Example 7.1 (DAPLEX schema description). The DAPLEX schema of

    Figure 5 can be expressed in ~~~,, as follows:person = (AND p(NOT t)(ALLa name string )(NR~ name 1,1))

    teaching = (AND t(NOTp)(ALLa name string)(NRa name 1,1))student = ( AND person( ALL enrolled-in teaching )(NR enrolled-in 1, n))course = ( AND teaching( ALL( INV enrolled-in )s tudent ))17

    Example 7.2 (DAPLEX schema acquisition example). This example showshow local inconsistencies introduced by a contradictory role filler are detectedand removed, in accordance with the CANFORMINV algorithm. Let usconsider the schema description introduced above and add the three entitiesundergrad-student, grad-student (disjoint from undergrad-student), and ad-vanced-course, described as follows:

    undergrad-student = (AND us student (NOTgs ))grad-student = ( AND gs student (NOT us ) )

    advanced-course = (AND course( ALL(INV enrolled-in) undergrad-student ))

    The new entity specialized-course, defined as follows:specialized-course = (AND advanced-course

    ( ALL(INVenrolled-in )grad-student ))is modified by algorithm CANFORMINV. In fact, the role ( lNV enrolled-in) isfilled with a contradictory concept, thus rules 2i and 2e generate an NRcomponent with min = max = O, and then the minimal description results as

    specialized-course = (AND teaching( NR( INV enrolled-in)O, O) )Obviously, even though this description of a teaching with no enrolledindividuals is not contradictory, its rewriting points out a possible mistake,as it is probably a useless concept.

    o The procedural features of DAPLEX are disregarded, since they are beyond the expressivecapabil it ies of Y~~, which is designed only for data structure descriptions.17Note that, for simplicity, the ~ ~U syntax does not allow naming of inverse roles.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    30/38

    414 . S Bergamaschi and C. Sartorie,, @,l_

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    31/38

    On Taxonomic Reasoning in Conceptual Design . 415Example 7.4 ( Aggregation/multiargument functions.) This example

    shows how multiargument functions (or aggregation abstraction in concep-tual model terminology) can be represented in YZ%U. The entity enrollment>shown in Figure 6, is an aggregation of the entities student and course andhas the property enroll-year:

    enrollment = (AND(ALL rl student)(NR rl 1,1)(ALLr2 course)(NRr21, 1)(ALL. enroll-year integer )(NR. enroll-year 1,1)

    8. RELATED WORKSThe first category of work to be considered was developed in the area ofdatabase conceptual schema validation. Most of this work differs substan-tially from our approach since the former does not consider defined concepts[2, 5, 6, 28, 36, 381. With this perspective, the main activity is to checkconcept description consistency with respect to the given explicit specializa-tion ordering. Our approach, together with those of Finin and Silverman [32],Bergamaschi et al. [10], Delcambre and Davis [27], and Ait-Kaci [3], provide amore active role, allowing computation of the specialization ordering on thebasis of concept descriptions.Atzeni and Parker [5] formally introduce consistency and redundancy of a

    conceptual schema, and the problem of checking the schema for these proper-ties is reduced to a graph analysis problem. The framework of the solution islimited with respect to the expressivity of a schema definition: only explicitisa and disjointness statements are considered. Abiteboul and Hull [2] pre-sent the IFO model and a set of rules on isa relationships that guaranteeconsistency of an IFO schema (without disjointedness constraints) by provingthat no type of the schema is contradictory. Atzeni and Parker [6] introduce apolynomial algorithm for the computation of set containment inferences for atype system. The notion of set containment is similar to that of subsumption,but the type descriptors allow only explicit set containment declarations,either in positive or negative form.Di Battista and Lenzerini [281 present a deductive method for E/R model-ing. Its purpose is to provide a tool for consistency and minimality checking.The representational mechanisms are fairly powerful, including isa specifica-tions, disjointedness, aggregation, mandatory participation of an entity to arelationship, and negation of all the above specifications. The checkingalgorithms are claimed to be tractable and complete, and the major limit isthe lack of defined class semantics which prevents the inference of nonex-plicit subsumptions.The problem of schema consistency is also considered by Lecluse et al. [381

    and Lecluse and Richard [36] with respect to the Oz object-oriented datamodel. In these papers, assuming ideas from Cardellis work [20], type-check-ing algorithms that guarantee that descendant classes are consistent withtheir parent classes (inherited properties can only restrict value domains) arepresented. Lecluse and Richard [36] introduce an algorithm that computes all

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    32/38

    416 . S. Bergamaschi and C. Sartorithe possible isa links between class descriptions on the basis of their type.This algorithm reflects exactly what a subsumption algorithm can do withdefined classes where the valid implicit isa links are computed. Unfortu-nately, the tractable algorithm presented is not complete, as the refinementrules on disjunctive types eliminate one case of subsumption [11].Works of Finin and Silverman [32], Bergamaschi et al. [10], and Delcambre

    and Davis [27] have aims and methods similar to the present work. Finin andSilverman [32] present an interactive tool for knowledge acquisition. It isbased on classification and focuses more on user interface strategies andheuristics for classification than on formalizations and proofs. On the otherhand, its data model is very simple and does not allow multiple inheritance.Delcambre and Davis [27] present a classifier for object-oriented schemas.Their purpose is to discover new structural relationships and/or inconsisten-cies in order to refine a schema. They use the idea of defined class, and theirsemantics include class properties, number restrictions, and disjointnessstatements. The purpose and methods are in many aspects similar to thosepresented by Bergamaschi et al. [ 10], but neither considers inverse roles norgeneration of a minimal schema.To conclude, subsumption is used as a model of computation for knowledge

    representation languages by Ait-Kaci [3]. A programming language based ona calculus of type subsumption is proposed. Type structures are first-classobjects of the language, and, in the given semantics, type symbols denote setsof objects and label symbols the intension of functions. This semantics mapsthe partial ordering on type symbols into set inclusion, and defines subsump-tion between type structures in a way similar to the subsumption algorithmsdeveloped in hybrid systems and in the present work. The definition of twooperators meet (greatest lower bounds) and join (least upper bounds) issimilar to unification and generalization of logic programming, respectively.The difference with respect to our proposal and FDLs is again that typestructures are primitive.

    9. SUBSUMPTION FOR DATABASE QUERIES AND INSTANCESThe application of subsumption computation to other outstanding databasetopics, such as instance validation and recognition and query processing,have recently been investigated by Borgida et al. [13] and Beck et al. [8]. Inthese papers, by exploiting the semantics of defined concepts, classification,and subsumption in order to process database queries and recognize newinstances are explored. The novel feature of the proposed models is that DDLand DML are identical, thus providing uniform treatment of data objects,query objects, and view objects. The classification algorithm finds the correctplacement for a query object in a given object taxonomy, and the fundamentalcriterion for this placement is the subsumption relationship between twoobject classes (the union of the instances of the descendant object classessatisfy the query). Beck et al. [8] present the model CANDIDE by a notationmore familiar to the database researcher, which is an extension of the FDLK4NDOR [44, 45] representing standard data types of database environ-ACM Transactmns on Database Systems, Vol 17, No 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    33/38

    On Taxonomic Reasoning in Conceptual Design . 417ments (range, set, composition). Unfortunately, it is now known that theproposed algorithm to compute this subsumption relationship, which theauthors believed to be complete, sound, and tractable, really contains a trap(due to role hierarchy) where complexity gets out of hand [41]. Borgida et al.[13] present a model, CLASSIC, which has a notation more in the FDLtradition: query processing is tractable, as the subsumption algorithm ispolynomial and complete. Interesting new features of the model with respectto FDLs are coreference constraints,18 which specify simple equalities be-tween single valued roles and the ability to give instances of a concept byenumeration. Furthermore, the paper shows, thanks to the semantics ofdefined concepts, how the automatic recognition of an object of an applicationdomain as an instance of a class can be done. Finally, the effectiveness ofclassification for intensional query answering is shown.To summarize, CANDIDE is proposed as a new conceptual model, while

    CLASSIC presents a system based on an FDL and its feasibility for databaseobjectives. To compare them with the present work, we can say that w=* andthe theoretical framework proposed in this paper are placed at a differentlevel, being a general formal tool applicable to existing conceptual modelswhich can be used to face various database topics.In fact, the SUBS and CLASSIFY algorithms can be the basis of a systemperforming instance validation and recognition and query processing opti-

    mization. A formal treatment of instance validation and recognition, knownas hybrid inference, has been investigated by many AI researchers, assurveyed by Nebel [42].Let us give a hint on the use of subsumption for database instantiation.

    The core of the problem is to decide whether a given object o belongs to theextension of a concept c. This is equivalent to checking that o* does notviolate the integrity constraints embedded in the c description. The first stepis taken by a conceptualize algorithm, which derives a corresponding conceptdescription c* from the o* description. Roughly speaking, c* is obtained bydetermining for each instantiated role the most specialized concept thatabstracts it and for each attribute the most specialized value domain. ThenSUBS validates o*, as usual in a database environment, as follows:c* = conceptualizecompute SUBS(C, c*)Moreover, O* can be recognized as an instance of one or more concepts of thetaxonomy, which specialize the concept c. To this end, two strategies arepossible:(1) the taxonomy can be modified with CLASSIFY, and o* is thus stored

    as an instance of the new added concept, instead of as an instance of c;(2) MINDESC(C) is computed and o is stored as an instance of every concept

    in MSGS(C *), without any modification in the taxonomy.

    18This constraint is also represented by Ait-Kaci [3].ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    34/38

    418 . S. Bergamaschl and C. Sat-toriAnalogous considerations hold for query processing. In fact, a wide class ofqueries can be expressed as an object description.10. CONCLUSIONSIn this paper a theoretical framework for schema acquisition that preservesconsistency and minimality is presented. The aim is to allow the databasedesigner to build a conceptual schema, organized in a strict inheritancetaxonomy, by freely supplying new concept descriptions in any order.To this end, two distinct compositional formalisms, @Y* and /91~U, have

    been introduced, which extend FDL languages developed in an AI environ-ment and represent, respectively, the data semantics of conceptual modelsgiving prominence to type structure and attributes. These formalisms includethe defined concept semantics and constitute a general formal tool to supportthe consistency and minimality of a conceptual schema, applicable to anyconceptual model: the 52?* and $7$7: ~ represenation of the data semantics ofthe E/R and DAPLEX models have the aim of showing this capability.Schema consistency is formally defined by the definition of a contradictory

    concept, and is ensured by the algorithms CANFORM and CANFORMINV,which are able to detect contradictory concepts with respect to a giventaxonomy.Furthermore, to go beyond passive consistency checks, the defined conceptsemantics, the subsumption relationship, and the minimal description of aconcept are formally defined. The classification algorithm, CLASSIFY, basedon the complete and tractable subsumption relationship algorithm SUBS,permits the more active part of determining the minimal description of a newconcept, thus classifying it in the right place of a given taxonomy.The results of this paper, as skeched in Section 9, can be the foundation for

    relevant contributions on database querying and instantiation driven bysubsumption.APPENDIX A SYNTAX FOR E/R SCHEMA DECLARATIONThis appendix recalls the syntax for E/R schema declaration presented byBatini et al. [7].

    schema + SCHEMA: schema-nameentity-sectiongeneralization-sectionrelationship-sectionentity-section + {entity-decl}entity-decl + ENTITY: entity-name

    [attribute-sectlonl[compound-attnb ute-section][identifier-section]attribute-section + ATTRIBUTES: attribute-declattribute-decl + [(min-card:max-card)] attribute-name type-declmln-card + O I 1 I integermax-card + 1 I N I integer

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    35/38

    On Taxonomic Reasoning in Conceptual Design . 419type-decl + INTEGER I REAL I BOOLEANI TEXT(integer) I

    ENUMERATION: (value-list)compound-attribute-section + COMPOUND ATTRIBUTES: {comp-attr-decl}comp-attr-decl + [(min-card:max-card)]comp-attr-name of {attribute-decl}identifier-section + {identifier-decl}Identifier-decl + attribute-LISTgeneralization-section + [gen-bier-section][SUBSET-section]gen-bier-section + (gen-bier-decl)gen-bier-decl + GENERALIZATION: [(coverage, coverage)] GEN-nameFATHER: entity-nameSONS: entity-name-listigcoveragel + P I Tcoverage2 + E I Osubset-section + (subset-decl)subset-decl + SUBSET entity-name OF entity-namerelationship-section + (relationship-decl)relationship-decl + RELATIONSHIP: relationship-nameCONNECTED ENTITIES: {corm-entity-decl}AlTRIBUTES: {attribute-decl}corm-ent-decl + [(min-card:max-card)l entity-name

    APPENDIX B SYNTAX FOR DAPLEX SCHEMA DECLARATIONThis appendix recalls the subset of DAPLEX syntax for schema declaration,presented by Shipman [50], which can be expressed in 3ZY*.

    schema + {declarative}declarative + entity-declarative Ifunction-declarativeentity-declarative + DECLARE entity-name + > ENTITYfunction-declarative + DECLARE function-decl IDEFINE definitionfunction-decl + function Ifunction predicatefunction + role-function Iattr-functionattr-function + attr-name (entity-tuple) multiplicity value-set Irole-function + role-name (entity-tuple) multiplicity entity-namemultiplicity + - I ->definition + function-defn I entity-defnfunction-defn + role-name-1 (entity-set-1) IINVERSE OF role-name-2 (entity-set-2)

    19coveragel denotes partial (P) or total (T) partition of the father entity into the sons entities.coverage2 denotes mutual exclusion (E) or overlapping (0) of the sons.

    ACM Transactions on Database Systems, Vol. 17,No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    36/38

    420 . S. Bergamaschi and C. Sartor!entity-defn + entity-name INTERSECTION OF entity-tupleentny-tuple + entity-name Ientity-name, entity-tuplepred + quant EXISTquant + AT (LEAST I MOST) integervalue-set + integer I real I s tring I boolean I{atom,, . . . .a tom.}

    ACKNOWLEDGMENTSWe are grateful to Paolo Tiberio for his discussion and suggestions. BernhardNebel contributed valuable comments on the subsumption algorithm. We arealso grateful to the anonymous referees for their helpful suggestions thatconsiderably improved the quality of the paper.

    REFERENCES1, ARITEBOUL, S., AND GRUMBACH,S. Col: A logic-based language for complex objects. In EDBT

    88Lecture Notes in Computer Science N.303, S. Ceri, J. W. Schmidt, and M. Missikoff,Eds,, Springer-Verlag, New York, 1988, pp. 271-293.

    2. ABITEBOUL, S., ANDHULL, R. IFO: A formal semantic database model. ACM Trans. DatabaseSyst. 12, 4 (1987), 525-565.

    3. AIT-KACI, H. Type subsumption as a model of computation. In Proceedings of the 1stInternational Workshop on Expert Database Systems, Benjamin/Cummmgs, Menlo Park,Calif., 1986, pp. 115-140.

    4. ALBANO, A., CARDELLI, L., AND ORSINI, R. Galileo: A strongly typed, interactive conceptuallanguage. ACM Trans. Database Syst 10, 2 (1985), 230-260.

    5. ATZENI, P., AND PARKER, D. S. Formal properties of net-based knowledge representationschemes, Data Knowl. Eng. 3 (1988), 137147,

    6. ATZENI, P., AND PARKER, D. S. Set containment inference and syllogisms. Theor. Comput.Sci. 62 (1988), 39-65.

    7, B~TINI, C., CERI, S., AND NAVATHE, S. B. Conceptual and Logzcal Database Design: TheEntity-Relationship Approach. Benjamin/Cummmgs, Menlo Park, Calif., 1992,

    8. BEGK, H. W., GALA, S. K., ANDNAVATHE, S. B. Classification as a query processing techniquein the CANDIDE data model. In Proceedings of the 5th International Conference on DataEngineering (Los Angeles, Feb., 1989), pp. 572-581.

    9. B~RGAMASCHI,S., BONFATTI, F., CAVAZZA,L., SARTORI,C., ANDTIBERIO, P. Relational databasedesign for the intensional aspects of a knowledge base. Znf. Syst. 13, 3 (1988), 245-256.10. BERGAMASCHI, S., CAVEDONI, L., SARTORI, C., AND TIBERIO, P. On taxonomical reasoning inE/R envmonment. In Proceedings of the 7th International Conference on the Entzty Relatzon-ship Approach (Roma, Italy, Ott., 1988), Elsevier Science, North-Holland, Amsterdamj 1989,pp. 443-454.

    11. BERGAMASCHI, S ., AND NEBEL, B. The complexity of multiple inheritance in complex objectdata models. In Workshop on AI and ObjectsIJCAl 91 (Sidney, Australia, Aug. 1991).

    12. BERGAMASCHI,S., AND SARTORI,C. On taxonomic reasoning in conceptual design. Tech. Rep.78, CIOC-CNR, Bologna, Italy, 1991.13. BORGIDA, A., BRACHMAN, R. J., MCGUINNESS, D. L., AND RESNICK, L. A. CLASSIC: A

    structural data model for objects. In SIGMOD (Portland, Or., 1989), ACM, New York, 1989,pp. 58-67,

    14. BOUZEGHOUB,M., GARDMUN, G., AND METAIS, E. Database design tools: An expert systemapproach. In Proceedings of the Intern at~onal Conference on Very Large Databases (Stockolm,Aug., 1985), pp. 82-95.

    15. BRACHMAN, R. J., GILBERT, V. P., AND LEVESQUE, H, J. An essential hybrid reasoningsystem: Knowledge and symbol level accounts of KRYPTON. In IJCAI (Los Angeles, Aug.,1985), pp. 532-539.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    37/38

    On Taxonomic Reasoning in Conceptual Design . 42116. BRACHMAN, R. J., AND LEVESQUE, H. J. The tractability of subsumption in frame-based

    description languages. In AAAZ (Austin, Tex., 1984), pp. 34-37.17. BRACHMAN,R. J., AND SCHMOLZE,J . G. An overview of the KL-ONE knowledge representa-

    tion system. Cognitive Sci. 9, 2 (1985), 171-216.18. BRODIE, M. L., AND MYLOPOULOS, J., EDS. On Knowledge Base Management Systems.

    Springer, New York, 1986.19. BUNEW, P., AND FRANKEL, R. E. FQL: A functional query language. In SIGMOD (Boston,

    1979), ACM, New York, 1979, pp. 52-58.20. CARDELLI, L. A semantics of multiple inheritance. In Semantics of Data TypesLectureNotes in Computer Science, 173. Springer, New York, 1984, 51-67.

    21. CERI, S., ED. Methodology and Tools for Database Design. North-Holland, Amsterdam,1983.

    22. CHEN, P. The entity-relationship modelTowards a unified view of data. ACM Trans.Database Syst. 1, 1 (1976), 9-36.23. CHEN, P. ER-Designer. Chen, Baton-Rouge, La., 1987.

    24. CHOOBINEH, J., MANNINO, M. V., NUMAKER, J. F., AND KONSINSKV, B. R. An expert databasedesign system based on forms. IEEE Trans. Softw. Eng. 14, 2 (1988), 242-253.

    25. DAYAL, V. ET AL, PROBE: A research project in knowledge oriented database systemsPre-liminary analysis. Tech. Rep. 85-03,, Computer Corporation of America, 1985.

    26. DE TROVER, O. RIDL: A tool for the computer-assisted engineering of large databases inthe presence of integrity constraints. SIGMOD Rec. 18, 2 (June 1989), 418-429.

    27. DELCAMBRE, L. M. L., AND DAVIS, K. C. Automatic validation of object-oriented databasestructures. In Proceedings of the 5th International Conference on Data Engineering (LosAngeles, 1989), pp. 2-9.

    28. DI BATTISTA, G., AND LENZERINI) M. A deductive method for Entity-Relationship modelling.In Proceedings of the 15th International Conference on Very Large Databases (Amsterdam,Aug. 1989), pp. 1321.

    29. DONINI , F. M., LENZERINI, M., NARDI, D., ANDNUTT, W. The complexity of concept languages.In KR 91. In Proceedings of the 2nd International Conference on Principles of KnowledgeRepresentation and Reasoning (Cambridge, Apr. 1991), J. Allen, R. Fikes, and E. Sandewall,Ed., Morgan Kauffmann, Palo Alto, Calif., 1991, pp. 151-162.

    30. DONINI, F. M., LENZERINI, M., NARDI, D., AND NUTT, W. Tractable concept languages. InlJCAZ 91 (Australia, Aug., 1991), pp. 458-463.

    31. FERRARA,F. EASY-ER, an integrated system for the design and documentation of databaseapplications. In Proceedings of the 4th International Conference on Entity-RelationshipApproach (Chicago, 1985).

    32. FININ, T., AND SILVERMAN, D. Interactive classification as a knowledge acquisition tool. InExpert Database Systems, L. Kershberg, Ed. Benjamin\ Cummings, Menlo Park, Calif., 1986,pp. 79-90.33. HAMMER, M. M., AND MCLEOD, D. Database description with SDM: A semantic data model.ACM Trans. Database Syst. 6, 3 (1981), 351-386.

    34. HULL, R., AND KING, R. Semantic database modelling Survey, applications and researchissues. ACM Comput. Surv. 19, 3 (1987), 201252.

    35. HULL, R. B., ANDYAP, C. K. The format model: A theory of database organization. J. ACM31, 3 (1984), 518-537.

    36. LECLUSE, C., ANDRICHARD, P. Modelling complex structures in object-oriented databases. InSymposium on Principles of Database Systems (Philadelphia, Pa., Mar., 1989), ACMSIGACT-SIGMOD-SIGART, pp. 362-369.

    37. LECLUSE, C., ANDRICHARD, P. The 02 database programing language. In Proceedings of the15th International Conference on Very Large Databases (Amsterdam, Feb. 1989), pp. 411-422.

    38. LECLUSE, C., RICHARD, P., AND VELEZ, F. 02, an object-oriented data model. In SIGMOD(Chicago, June 1988), ACM, New York, pp. 424-433.39. LUCK, VON K., NEBEL, B., PELTASON, C., AND SCHMIEDEL, A. The BACK System. KIT 41,Tech. Univ. Berlin, 1987.40. MYLOPOULOS, J., BERNSTEIN, P. A., AND WONG, H. K. T. A language facility for designingdatabase-intensive applications. ACM Trans. Database Syst. 5, 2 (1980), 185-207.

    ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.

  • 8/3/2019 On Taxonomic Reasoning in Conceptual Design

    38/38

    422 . S. Bergamaschi and C. Sartori41. NEBEL, B. Computational complexity of terminological reasoning in BACK. Ai-tLf. Inte ll .

    34, 3 (1988), 371-383.42. NEB~L, B. Reasoning and Revlslon in Hybrid Representation Systems. Lecture Notes on

    Artij?cial Intelligence, Vol. 422, Springer, New York, 1990.43. NEBEL, B. Terminological reasoning is inherently intractable. Artif. Intell. 43, 2 (1990).

    Research Note, 235-249.44, PATEL-SCHNEIDER, P F. Small can be beautiful in knowledge representation. In Proceed-

    ings of the Workshop on Principles of KnowlecZge-Based Systems (Denver , Colo., Dec. 1984),IEEE, New York, 1984, pp. 11-16.45. PATEL-SCHNEIDER, P. F. Afour-valued semantics for frame-based description languages. In

    Proceedings A&U(PhiladeIphia, Pa., 1986), PP. 344-348.46. REITER, R. Towards alogical reconstruction ofrelational database theory .In On Conceptual

    Modellmg, M. L. Brodie, J. Mylopoulos, and J. W. Schmidt, Eds. Springer, New York, 1984,pp. 191-233.

    47. SCHMIDT-SCHAUSS,M. Subsumption m KL-ONE is undecidable. In AX 89lst Interna-tional Conference on Prlnclples of Knowledge Representation and Reasoning, R. J. Brach-mann, H. J. Levesque, and R, Reiter, Eds. Morgan Kauffmann, Menlo Park, Calif. (Toronto,May 1989), pp. 421-431.

    48. SCHMIDT-SCHAUSS,M., AND SMOLKA, G. Attributive concept descriptions with unions andcomplements. Art+. Intell. 48, 1 (1991), 126.

    49. SCHMO