[ieee 2009 third international conference on research challenges in information science (rcis) -...

10
Data Base Reuse Methodology - ReTARI Rosa Gonzales Martínez Department of Informatics Carlos III University of Madrid, UC3M Madrid, Spain [email protected] Jorge Morato Lara Department of Informatics Carlos III University of Madrid, UC3M Madrid, Spain [email protected] Omar Hurtado Jara Department of Informatics Carlos III University of Madrid, UC3M Madrid, Spain [email protected] Anabel Fraga Vásquez Department of Informatics Carlos III University of Madrid, UC3M Madrid, Spain [email protected] AbstractAt the moment, organizations are used to transforming in a continuous way and one of the main changes is technology; it is needed to develop new systems that help old systems to evolve. The change brings with it an intrinsic study and reuse of databases, its design must be assumed by software developers, they need to study old data base designs because they are focal points for the construction and design of new systems. This report proposes a semi-automatic database design methodology for reuse. The methodology named ReTARI is structured in five steps: Exploration of existing databases structures, Transformation, Storing, Recovering and Integration. ReTARI proposes the use of reuse techniques on databases structures in order to reuse them in the construction of new software systems. In consequence, the final goal of ReTARI is to improve the productivity and quality of new databases designs and as a result to improve all the software development process. Keywords: Software Reuse, Software Repositories, Indexing, Retrieval, data Bases. I. INTRODUCTION Nowadays, organizations are submitted to continuous changes in order to survive. The main change is the technological one, it is needed to develop software which increase and modify older functionalities. These changes may also occur in database designs, and reusing data structures is strongly recommendable. But this involves tedious and complicated activities. The main problem for software developers lies in the ability to access, extract, retrieve and reuse database structure efficiently. Otherwise software reuse is well known as a best practice, it produce high benefits in the companies that apply this policies [17]. It is also known that real useful reuse practices affects to early stages of the codification itself [9]. And recent studies known that reuse is not only software reuse, but also knowledge reuse [21]. Therefore applying reuse in the database design stage will improve productivity and quality of it, and it will be a benefit for the complete software lifecycle. The research goal is to develop a methodology for reusing database structures, not just data. It will reduce for software developers the complex access to the available knowledge. Then, the methodology proposes five steps and the application of techniques and tools for reusing databases structures. These steps are as follow: gathering, transformation, storage, retrieval and integration (ReTARI). Among the reusing techniques we can highlight the use of a storage repository, representation models, search and retrieve of reusable assets. In this research we will consider the following hypothesis: if databases structures are represented following an appropriate representation model, we will be able to search and retrieved in an efficient manner, improving the reuse in the new database structure created. The remainder of this document is as follow: section 2 presents the state of the art, section 3 describes the methodology showing the stages, methods and techniques. In section 4 the validation is included and in section 5 presents conclusions of the work. And finally section 6 presents’ future works. II. STATE OF THE ART Database structures are mainly based on relational models; it is the reason to focus this study on this kind of databases: relational database structures. The structure is a description of a database using a relational data model. The structure, also named schema, is a set of tables, relationships, attributes, keys, restrictions and domain. [5] 9781-4244-2865-6/09/$25.00 ©2009 IEEE

Upload: anabel

Post on 27-Feb-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Data Base Reuse Methodology - ReTARI

Rosa Gonzales Martínez

Department of Informatics

Carlos III University of Madrid, UC3M

Madrid, Spain

[email protected]

Jorge Morato Lara

Department of Informatics

Carlos III University of Madrid, UC3M

Madrid, Spain

[email protected]

Omar Hurtado Jara

Department of Informatics

Carlos III University of Madrid, UC3M

Madrid, Spain

[email protected]

Anabel Fraga Vásquez

Department of Informatics

Carlos III University of Madrid, UC3M

Madrid, Spain

[email protected]

Abstract— At the moment, organizations are used to

transforming in a continuous way and one of the main changes

is technology; it is needed to develop new systems that help old

systems to evolve. The change brings with it an intrinsic study

and reuse of databases, its design must be assumed by software

developers, they need to study old data base designs because

they are focal points for the construction and design of new

systems. This report proposes a semi-automatic database

design methodology for reuse. The methodology named

ReTARI is structured in five steps: Exploration of existing

databases structures, Transformation, Storing, Recovering and

Integration. ReTARI proposes the use of reuse techniques on

databases structures in order to reuse them in the construction

of new software systems. In consequence, the final goal of

ReTARI is to improve the productivity and quality of new

databases designs and as a result to improve all the software

development process.

Keywords: Software Reuse, Software Repositories, Indexing,

Retrieval, data Bases.

I. INTRODUCTION

Nowadays, organizations are submitted to continuous changes in order to survive. The main change is the technological one, it is needed to develop software which increase and modify older functionalities. These changes may also occur in database designs, and reusing data structures is strongly recommendable. But this involves tedious and complicated activities. The main problem for software developers lies in the ability to access, extract, retrieve and reuse database structure efficiently.

Otherwise software reuse is well known as a best practice, it produce high benefits in the companies that apply this policies [17]. It is also known that real useful reuse practices affects to early stages of the codification itself [9]. And recent studies known that reuse is not only software

reuse, but also knowledge reuse [21]. Therefore applying reuse in the database design stage will improve productivity and quality of it, and it will be a benefit for the complete software lifecycle.

The research goal is to develop a methodology for reusing database structures, not just data. It will reduce for software developers the complex access to the available knowledge. Then, the methodology proposes five steps and the application of techniques and tools for reusing databases structures. These steps are as follow: gathering, transformation, storage, retrieval and integration (ReTARI). Among the reusing techniques we can highlight the use of a storage repository, representation models, search and retrieve of reusable assets.

In this research we will consider the following hypothesis: if databases structures are represented following an appropriate representation model, we will be able to search and retrieved in an efficient manner, improving the reuse in the new database structure created.

The remainder of this document is as follow: section 2 presents the state of the art, section 3 describes the methodology showing the stages, methods and techniques. In section 4 the validation is included and in section 5 presents conclusions of the work. And finally section 6 presents’ future works.

II. STATE OF THE ART

Database structures are mainly based on relational models; it is the reason to focus this study on this kind of databases: relational database structures. The structure is a description of a database using a relational data model. The structure, also named schema, is a set of tables, relationships, attributes, keys, restrictions and domain. [5]

9781-4244-2865-6/09/$25.00 ©2009 IEEE

A. Relational database structures

A relational database structure could be represented in diverse formats; the most used for represent it are shown below:

• Catalogue

• XML

• Graphs

• Natural Langage

• RDFS

1) Database structure Catalogue representation A relational database structure represented as a Catalogue

use SQL (Structured Query Language) as formal language to represent it as a SQL script. SQL is a declarative structured language that identifies elements in a relational model. Each element is identified and extracted using the SQL script. “Fig. 1” shows an example of the relational model expressed as SQL code.

Figure 1. Relational Schema in SQL

2) Database structure XML representation A database structure XML (eXtensible Markup

Language) representation specifies the structure and elements contained in the XML using DOM. DOM is the standardized object model for XML and HTML. DOM shows a document as a logic structure similar to a tree. “Fig.2” shows a relational XML schema.

Figure 2. Relational Schema in XML

XML is emerging as a data format in the Internet era. An

increasing need for efficient data contain and efficient data search, and at the same time the amount of data to transform it in XML formats. A possibility for solving these needs is to transform XML data in a relational format and vice versa, in order to use mature technology in relational databases. [14]

3) Database structure Graph representation A relational database structure Graph representation [6]

includes:

• Relationships visualization: It shows a detailed view of foreign keys contained.

• SQL code editor: A SQL code (text) editor. It allows to modify and to save it. In the file the SQL sentences are contained.

• Relational Graph generator: It generates the graph using as base the created relationships. The graph is visualized and it could be printed also, or even stored in a file using an own format.

“Fig. 3” shows a relational schema represented as a graph.

Figure 3. Relational Schema as Graph

4) Database structure Natual Language representation The problem of extracting structured data from natural

language is a huge problem nowadays; it goes from understanding to translation. Relevant knowledge extraction coming from a conceptual model has two problems: Order relevant and not relevant assertions, and establish natural language concepts correspondence with conceptual model concepts, in this case Relational Model. [10]

The relational model from a natural language interface has some aspects to be considered: natural language parsing (scanning and search process), knowledge ordering and knowledge retrieval. In order to reduce complexity of scanning and searching, it is allowed a restrictive grammatical, it facilitates the designer task. In the KASPER project [11] it is used a very restricted language called “normalized language”. This language uses standard grammatical and terms, that will be used for the language specification. The CASE tools could transform easier the specification to conceptual structures. However, some experts states that this simplification provides only appearance of natural language. It is not usable, because it does not deal with polysemy (homonymy, homotaxia), phrasing (synonym, definition) and context relation (anaphora, metaphor). Some research projects relating CASE tools, as DMG and NIBA extended its languages to complex sentences.

A specified natural language translation is not only a syntactic process, it requires a high level semantic process based on experts of diverse areas like: natural language process and databases modelling.

5) Database structure RDFS representation Some proposals for integrating relational database

models and RDFS models are available [15]. The tools use RDFS as a mechanism for specification and execution of links between relational databases and ontologies for domain. It preserves the original relational model; ensuring data is accessible and usable in consistency with the original data model.

RDF and RDFS languages have been developed for semantic extensions that would be understand by machines; it facilitates more intelligent information processing. RDF/S language provides a unified syntax with well defined semantics and it is capable of split data (RDF) and metadata (RDFS).

In a relational database model, the columns represent attributes, the rows represent entities, and tables contain a set of attributes identified by each entity.

The main goal of linking Relational Model with RDF/S Models is to allow ontology based queries in relational databases. Relational data must be available for RDFS reasoners using a controlled vocabulary of the domain ontology. Moreover, it must be possible to redo relevant data from RDF/S serialization. It preserves information completely and user might go back to the original data and change/reuse it if needed.

“Fig. 4” shows a relational schema in RDF.

Figure 4. Relational Schema in RDF

At last, other techniques for data organization are

available, as intuitive tabular form supported by calculus applications, or any GUI (Graphical User Interface).

B. Relational Databases Reuse Tools

The main problem of reusing database structures is generated by diverse factors, highlighting the lack of appropriate tools. Two sets of tools related to the objective of this work were located:

1) Database modeling tools The first group is composed by database modelling tools,

graphic tools used for database design. This kind of tools is available for diverse platforms and enclosure Entity-Relation diagrams or UML (Unified Modelling Language) models.

In this set, the more relevant are shown as follows: Platinum Erwin, EasyCase, Oracle Designer, System Architect and Rational Rose; all of them compatible with relational databases. These tools allow reverse engineering and generate relational models.

In the case of UML modelling, some commercial tools are available: Microsoft Visio, Borland Together, Rational Rose, and regarding free software: BOUML, Eclipse, ArgoUML, MonoUML and StarUML.

2) Comparing schemas and database content tools The second set covers schema and data comparing Tools

using SQL scripts. These tools connect to database managers and search for tables or procedures and then compares and synchronize desired elements.

Some of the available tools for this set are: SQL Comparer, Data Comparer, DTM Schema Comparer [19]. These provide different process for developing and maintaining of database schemas. One of the more efficient models offers a source control at object level. It means, the

comparison is made between database schemas and the script files, using an SQL comparer.

Therefore, even some tools provide useful utensils for developing and updating a data base, it results to be insufficient in order to achieve an adequate reuse of data base designs, some of the lacks of the tools could be found in the following list:

• An automatic indexing and classification process, allowing storing information in a repository as a reusable asset.

• An efficient retrieval module, allowing obtaining stored database schemas.

• A search method for finding relevant assets.

One possible solution in this case could be to provide a tool with the characteristics supposed to be part of them, but a lack nowadays. So, a methodology is proposed, for an own tool, allowing to reuse database structures in an efficient manner.

III. RETARI METHODOLOGY

This research propose a methodology for reuse relational database structures, it is called ReTARI, as shown in “Fig. 5” ReTARI is a set of steps, and application of techniques and tools for reuse. The process is oriented in two ways: First, assets must be introduced in the repository, it means, database structures must be collected, and then the relational

1

model must be transformed to RSHP5. And the second one,

consist on retrieval of assets in the repository, transformation from RSHP to relational models, and then integration of resulting assets for creating the new database design.

Among the reuse techniques the representation model, the repository and the search/retrieve methods are highlighted [7].

On the subject of tools, the tool developed for this research is called ReTARI-Tool. It provides support for all the steps in the methodology. ReTARI-Tool, in the first version, works with database structures based on relational models

1 because of the widespread of it at the moment.

ReTARI methodology includes the following steps:

• Gathering database structures.

• Transformation.

• Storing.

• Retrieval.

• Integration

1 Relational model is a database model based on predicated logic and sets

theory. Some elements: relations, tuplets, attributes, cardinality, grade and domain [12]

Figure 5. ReTARI methodology schema

A. Gathering database structures

The phase gathering relational database structures or schemas

2 supports two sub stages: find databases and extract

the structures.

1) Find databases In this step, databases from companies and relevant

sources for developers are located. It is needed to prepare a list of paths, names, credentials, and so on in order to access the databases.

2) Structure extraction In this step, selected database structures are extracted for

future reuse in the design process of new databases. This process is supported by ReTARI-Tool using a window form that allows selecting a database, and extracts the elements in the structure using an extraction engine.

The extraction engine uses an API (Application User Interface) in order to connect to the database. The components APIs used are as follows: OLE-DB (Object

2 The schema is a description of data sets corresponding an organization model or part of it, achieved using a particular description language [2].

Data Base Reuse Methodology - ReTARI

RSHP Repository STORING RETRIEVAL

GATHERING

Data Bases

INTEGRATION

GUI

API

Extraction Engine

GUI

API

Integration Engine

TRANSFORMATION

Relational Model To

RSHP Model

RSHP Model To

Relational Model

Indexing Engine

Retrieval Engine

Databases Structures

Linking and Embedding) 3 and ODBC (Open Database

Connectivity)4. Using these components, ReTARI-Tool

extracts elements in the database structure like tables, columns and indexes in an easy mode.

The described procedure must be developed in a continuous way, in order to maintain an updated repository and provide a guaranty of available database structures for the development team.

B. Transformation

This step consists of translating extracted elements of database structures (based on relational model) to the information schema proposed (RSHP model)

5 and the

opposite process in the reuse instant.

The process is given in two contexts: first, if someone needs to store database structures, the elements are translated from the relational model to the RSHP model. And second, in the retrieval process the elements stored in the RSHP model are translated as relational models. The transformation process describes the rules for transforming the elements in the relational model in terms of RSHP elements, as shown in TABLE I.

TABLE I. MAPPING BETWEEN RELATIONAL MODEL AND RSHP MODEL

Relational Model RSHP Model

Database Structure Artifact

Relation (Table) KE

Attribute KE

Key KE

Constraint KE

Multiplicity KE

As shown in TABLE I, the mapping between both models generates the following products:

3 Object Linking and Embedding (OLE-DB), API for access databases. OLE-DB allows the access to any data format stored (databases, calculus sheets, text files, and so on) for an available OLE-DB provider [4].

4 Open database connectivity (ODBC), API for connecting to any data source using an available ODBC controller (MsSQL, PostGre-SQL, MySQL). [3] 5 RelationSHiP (RSHP) model is a representation schema based on relationships between elements like: artifacts, Concepts (KE), relationships (RSHP) and relationship types (Type-RSHP). [9], [16].

• Database structure to artifact: each database structure in the relational model is translated or mapped as an Artifact

6 with the same name in the RSHP model.

• Relational element to Knowledge Element (KE): an element in the relational model is mapped as a KE

7.

An element coming from the relational model could be: a table or relationship, an attribute, a key, a restriction, or even a multiplicity.

There is one more element in the RSHP model used for creating relationships: RSHPs. Each RSHP represents a relationship between KEs.

The ReTARI-Tool supports the process of transforming using an indexing engine. The goal of the indexer in this case is to provide an automatic mode for organizing, classifying, identifying, and creating relationships between elements in the RSHP model, using the relational elements as base.

C. Storing

This step consists of storing the database structure in the RSHP schema into the repository. The repository contains a structure for storing the reuse assets in the RSHP repository. The structure is shown in “Fig. 6”, and it consists of seven entities [13], as follows:

• Artifacts, it keeps the artifacts, for instance, a database structure.

• Artifacts_Type, it contains the artifact types, for instance, database structure type, calculus style sheet type, and so on.

• RSHP, it keeps all the relationships available between the KE elements, for instance the relationship between KE table and KE attribute.

• Grammatical, it contains the RSHP relationship types, for instance related type, equivalent types, ad so on.

• Knowledge_Elements, it keeps the descriptors, for example: the descriptor in a table, the attribute descriptor, and so on.

• Vocabulary, it keeps the concepts, the final representation of a KE.

• Rules_Families, it keeps the concept types stored in Vocabulary, for example: table type, attribute type, etc.

6 The artifacts represent information elements, as containers, for instance

an artifact could be a database structure. [13].

7 A KE is an information descriptor or concept, for instance a relationship (table), an attribute, a key, ad so on. [13].

Figure 6. Class diagram of the repository

D. Database structures retrieval

The retrieval process [8] allows, in an user query, extract relevant database structures available the repository. This process consists of two sub process: The query proposed by the user for retrieving the database structure and the extraction. The extraction process is supported by an engine tool in the ReTARI-Tool. This component, depending on the query, will extract relevant documents using retrieval techniques described as follows:

• Classic search, it finds assets containing identical terms to the search patterns.

• Semantic search, base don Thesauri 8, in this work

WordNet semantic database has been used. It aids expanding the assets to be search in the repository using equivalence, hierarchy, and whole-part relationships.

Finally, after finding relevant documents, the integration engine orders the relevant results showing them to the final user.

a) WordNet lexical net

WordNet is a lexical database with the major set of vocabulary in English language. EuroWordNet is also a huge database based on WordNet, but as a translated database, available in a sort of different languages. WordNet is an on-line reference system. It consists of a list of equivalent terms, for a context. Two relationships are allowed: lexical and semantic. Lexical relationships are based on the lexical of the words and semantic relationships are based on the meaning of the words. [20]

E. Integration

In this step, the database design is optimized for new projects. Once the relevant structures are shown in the previous step, the user uses ReTARI-Tool for comparing and selecting the database structures that he/she consider as

8 A thesaurus is a controlled ad dynamic vocabulary, composed by terms with semantic and generic relationships, in a particular domain of

knowledge. It contains a list of preferred terms, ordered by alphabet, a list

of synonyms, hierarchy or association between terms, scope notes, and a set of rules for using it. [18].

necessary. After that, the tool will provide an option for designing new databases using a creation engine. It captures the selected elements and translates them in SQL sentences. Finally, the user could integrate or export these resultant designs in a new project.

IV. HYPOTHESIS VALIDATIONS

As suggested in the hypothesis, the goal of this work is to recover database schemas and aids in the reuse process of it. In this sense, it is needed to measure the retrieval engine for ReTARI-Tool, considering the recall and precision metrics. [1]

A. Validation process

This stage begins with the gathering of databases in the web. The considered criteria is databases using SQL scripting for creating it, so catalogues could be gathered using a database admin tool.

The databases for the test corpus (87 database structures) have been selected using the most popular schemas in the web. Using the keywords “create table lang:sql” in the search browser, some SQL scripts has been collected, using Google Search Engine mainly, as shown in “Fig. 7”. In TABLE II, the URLs used for the test corpus are shown.

Figure 7. Google Code Search

Artifacts

RSHP

Knowledge

Elements

Artifact Type

Grammatical

Vocabulary Rules

Families

TABLE II. TEST CORPUS DATABASE URLS

Script SQL Web Link

create-sample.sql

db.mysql/.../org/netbeans/modules/db/mysql/resources/create-travel.sql

tables.sql orm-1.0.1/doc/manual/tables.sql - 3 idénticos

create_mysql_db

.sql

HBasic-0.9.9n/erp/src/sql/create_mysql_db.sql - 4

idénticos

erlydb.sql trunk/lib/erlyweb-0.6.2/test/erlydb/erlydb.sql

createdb.sql dcl/scripts/install/mssql/createdb.sql

db.sql trunk/scripts/db.sql

create_tables.sql trunk/utility/create_tables.sql

schema4.sql jdo2-tck-2.0/src/sql/derby/datastoreidentity/schema4.sql

schema9.sql trunk/tck2-

legacy/src/sql/derby/datastoreidentity/schema9.sql

create_biblio.sql guide/esempi/sql/create_biblio.sql

Microsoft

database

http://msdn.microsoft.com/es-es/express/bb403186(en-

us).aspx

Northwind and pubs

http://technet.microsoft.com/es-es/library/ms143221.aspx

Popular database http://www.databaseanswers.org/database_downloads.htm

Once database has been gathered as SQL scripts, some of them need to be adapted to the syntax to be used for the SGDB (Database Manager) desired to generate the catalogues. In this case, for the test, the Microsoft SQL Server 2003 SGDB has been used.

The gathering process ends with the extraction of database structures in the test corpus. The following step: transformation. It will translate extracted structures into RSHP structures. And at the end it will be stored as assets in the repository.

The retrieval has been tested using predesigned queries during this work. The criterion used has been the search of most common concepts in the corpus, using keywords. The techniques used for retrieval settle on two sets of queries: classic and semantic, the semantic query will take count of equivalent terms found in lexical database WordNet.

1) Calculus of retrieval metrics The calculus needed for calculating the precision and

recall, and precision-recall unified metric is presented with a set of variables and functions as follows in TABLE III.

TABLE III. VARIABLES AND FUNCTIONS

And finally, the following conditions must be considered:

• The test corpus is formed by 87 databases, grouped by: Commercial databases (52), Educational databases (22), Others databases (13).

• 14 queries (Qn) are formed for classic search (exact term) and semantic (equivalent term) of keywords (k).

• The keywords have been chosen following the most common terms in the databases: client, ship, student, role, item, employee, person, staff, contact, payment, book, product, order and customer.

• The values Er are controlled values, it means, the values have been tested in a manual test prior to the hypothesis test using ReTARI-Tool.

In TABLE IV, precision and recall metrics for classic queries are shown. The results obtained from fourteen queries (Q1…Q14) are shown and its respective media, using the key words (k) in the second row. Row number four contains the values for Er, it indicates the amount matching relevant database structures in the repository. Er is previously calculated, it is the basic number for calculating retrieval metrics. Row number five contains the values for Ev, it indicates the amount of matching database structures retrieved using ReTARI-Tool, here it is possible to get relevant and not relevant results. Row number six contains the valus for Evr, it represents the relevant database structures retrieved using ReTARI-Tool. Finally, the results for Recall, Presicion and F(j) are shown by using its corresponding function shown in TABLE III.

E = Number of database structures in the repository

Er = Number of relevant database structures in the repository

Enr = Number of not relevant database structures in the repository

Ev = Number of database structures retrieved by ReTARI-Tool

Evr = Number of relevant database structures retrieved by

ReTARI-Tool

Evnr = Number of not relevant database structures retrieved by ReTARI-Tool

Recall = Evr / Er

Precision = Evr / Ev

k= Keyword

Q1, Q2… Q11 = Database structures query having a k

F(j) = Precision Recall Unified of retrieved document j

F(j) = 2/(1/R(j)+1/P(j))

P=Media

TABLE IV. PRECISION AND RECALL VALUES FOR CLASSIC SEARCHES

E=87 Classic Search Using ReTARI-Tool Media

k client ship student role item

employee person staff contact

payment book product order

customer

Q Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

Er 25 26 27 28 30 38 40 41 45 56 58 61 68 89

Ev 30 30 31 33 34 43 44 47 51 63 64 66 78 106

Evr 15 17 18 21 23 30 32 34 38 50 52 56 63 84

Recall 60 % 65 % 67 % 75 % 77 % 79 % 80 %

83 % 84 % 89 % 90 % 92 % 93 % 94 % 81 %

Precision 50 % 57 % 58 % 64 % 68 % 70 % 73 % 72 % 75 % 79 % 81 % 85 % 81 % 79 % 71 %

F(j) 55 % 61 % 62 % 69 % 72 % 74 % 76 % 77 % 79 % 84 % 85 % 88 % 86 % 86 % 75 %

TABLE V shows precision and recall for semantic

queries. In the semantic search, Er values are greater than classic retrieval, because of the new techniques applied, as synonyms and hierarchy. Evr, shows similar values in both

cases, precision does not experience notorious changes. Given this, metric shows that recall, precision and F(j) have been improved in the semantic search.

TABLE V. PRECISION AND RECALL VALUES FOR SEMANTIC SEARCHES

E=87 Semantic Search Using ReTARI-Tool Media

k client ship student role item

employee person staff contact

payment book product order

customer

Q Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

Er 27 30 30 32 37 42 45 43 50 60 62 69 70 116

Ev 45 45 45 48 54 57 58 57 67 74 82 102 91 146

Evr 21 25 25 28 31 39 40 38 47 57 58 64 69 110

Recall 78 % 83 % 83 % 88 % 84 % 93 % 89 % 88 % 94 % 95 % 94 % 93 % 99 % 95 % 90 %

Precision 47 % 56 % 56 % 58 % 57 % 68 % 69 % 67 % 70 % 77 % 71 % 63 % 76 % 75 % 65 %

F(j) 58 % 67 % 67 % 70 % 68 % 79 % 78 % 76 % 80 % 85 % 81 % 75 % 86 % 84 % 75 %

2) Precision–Recall Graphics

In ”Fig. 8”, precision and recall tendencies obtained from

classic queries are shown. Recall and Precision shows a crescendo because of an increment in the number of retrieved terms in the queries.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Queries

Retrieval Metrics

Recall Precision Precision-Recall Unified

Classic Search

Figure 8. Precision Recall tendencies for classic search

In “Fig. 9”, precision and recall tendencies obtained from semantic queries are shown. Recall shows a crescendo compared to the curve in figure 8, because of semantic retrieval techniques have been applied, as synonym. It aids in the process or retrieving more terms but precision results remains.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Queries

Retrieval Metrics

Recall Precision Precisión-Recall Unified

Semantic Search

Figure 9. Precision Recall tendencies for semantic search

a) Analysis of results

As shown in Figure 4, the retrieval metrics achieved using just classic search (keyword or exact term search in the asset) are lower than semantic search (exact term and equivalent term) results. The semantic search increments the recall generating a major widespread. So, if the search criterion is increased then more database structures will be retrieved.

Even though, for the test a database and query terms were controlled in order to have a well controlled set for the search

engine, the precision-recall results are good (they are almost one). For that reason, it is a good assumption that for not controlled databases the result will be also acceptable. In consequence, the ReTARI-Tool allows to retrieve databases structures in an efficient manner, and it aids in the reuse process of databases finally. In that manner, the database design process will be improved and the whole software development process at the end.

V. CONCLUSIONS

First at all, the objective of the research has been accomplished. It has been possible to develop and test a methodology for reusing database structures. The methodology ReTARI aids software developers to go by information reuse problems of database structures in the software development process design stage. So, a basement for effective database reuse is established.

One of the main difficulties in the reuse process is solved: the retrieval. As well known the benefits of reusing software assets are huge, but the main problem is finding the asset in the repositories. This research focus on apply, successfully, retrieval techniques with an adequate degree of efficiency, for retrieving relevant database structures in a given query.

A high level of need for distributing and reuse information sources based on relational models exists, and the market does not have the appropriate tools to support this need. The market does not show the maturity to provide as alternative the reuse of database structures, maybe because of the lack of tools supporting this process, or maybe because of the meaning of it for some companies.

The ReTARI methodology efficiency comes from the systematic accomplishment of steps and techniques that it involves. In that sense, the implementation of it is independent to the ReTARI-Tool, in fact the methodology could be applied using any other tool supporting the reuse process. But in our case, because of the lack of tools available in the market, we decided to develop our own.

VI. LIMITS AND FUTURE WORKS

At the moment, it is allowed to reuse only database structures for relational models expressed in catalogues (tables, columns, keys, indexes, etc). So, the following step is to complete the ReTARI-Tool with different formats like XML, RDFs, Calculus Style Sheets, and so on. And following this path, additional APIs like JDBC Oracle could be included as well. On this way, languages for designing database structures could be included and the corresponding transformation schema: XML, RDF, SQL Oracle, and so on. A future work could be also, the reuse of data included in the database structure.

After that, the search criteria must be expanded with new relationships like: whole-part and antonyms for instance. The criteria at the moment are: Keyword search, equivalent term, hierarchy.

REFERENCES

[1] Lancaster, F. W. Information Retrieval Systems: characteristics, testing and evaluation. 2.ed. Ed. Wiley-Interscience. 1979.

[2] Gardarin, G., Valduriez, P. Relational Databases and Knowledge Bases. Eds. Addison-Wesley. Estados Unidos, pp. 75-81. 1989.

[3] Geiger, Kyle. Inside ODBC. Ed. Microsoft Press. 1995.

[4] Microsoft. Microsoft OLE DB 1.1 Programmer’s Reference and Software Development Kit. Ed. Microsoft Press. 1997.

[5] Castano, S., De Antonellis, V., Fugini, M. G., and Pernici, B. Conceptual schema analysis: techniques and applications. ACM Trans. Database Syst. 23, 3, September 1998.

[6] Fernández, D. Traductor between Entity- RelationSHiP and Relational Models. Julio 1998

[7] Wayne C. Lim. Managing Software Reuse. Estados Unidos, pp. 7-453. 1998.

[8] Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Ed. ACM Press Books. 1999.

[9] Ezran, M., Morisio, M., Tully, C. Practical Software Reuse: the essential guide. McGraw-Hill, New Delhi (India), 2000.

[10] Bouzeghoub, M., Kedad, Z., Métais, E. CASE Tools: a Computer Support for Conceptual Modeling, in Advanced Database Systems, Techniques and Design, O. Diaz & M. Piattini (eds), Artech House Publishers, March 2000.

[11] Bouzeghoub, M., Kedad, Z., Métais, E. Natural Language Processing and Information Systems. 5th International Conference Natural Language Aplications for information systems, Junio 2000.

[12] C. J. Date, Sergio Luis María Ruiz Faudón. Introduction to the Database Systems. Ed. Prentice Hall. 2001.

[13] Díaz, S. Information Representation Schemas based on relationships. Application to the automatic generation of domains representations. Madrid (España), 2001, pp. 3.

[14] Lee, D., Mani, M., W. Chu, W. Effective Schema Conversions between XML and Relational Models. 2002.

[15] Maksym Korotkiy and Jan L. Top. From Relational Data to RDFS Models, July 2004

[16] Llorens, J. Morato, J. Génova, G. RSHP: an information representation model based on relationships. Soft-Computing in Software Engineering: Theory and Applications. Ed. Damiani, Ernesto; Jain, Lakhmi C.; Madravio, Mauro. Springer Verlag, New Cork. 2004.

[17] Llorens, J., Fuentes, JM. Software Reuse: Tendencies of the Information Technology in the Public Administration. 2006.

[18] Sánchez, S. Definition of the Methodology for the automatic construction of knowledge organization systems. Diciembre 2006

[19] DTM Schema Comparer, 2008. http://www.sqledit.com/scmp/index.html[última visita 08-julio-2008]

[20] WordNet, 2008. http://wordnet.princeton.edu/ [última visita 28-julio-2008]

[21] Fraga, A., Llorens, J. Universal Knowledge Reuse: anything, anywhere, and anybody. First International Workshop on Knowledge Reuse (KREUSE 2008) - ICSR2008 (Proceedings). ISBN: 978-84-691-3166-4. Beijing, May 25, 2008.