[ieee 2009 sixth international conference on information technology: new generations - las vegas,...

6
XML Based Implementation of a Bibliographic Database and Recursive Queries Kazem Taghva Kirankumar Jayakumar Information Science Research Institute University of Nevada, Las Vegas [email protected] Abstract The Structured Query Language (SQL) of relational database models does not have the expressive power to implement recursive queries. Consequently, recursive queries are implemented as an application program in the host language. The newly developed XML schema provides a different setting for database design and query implementation. In this paper, we design and implement an XML schema and a set of associated queries for a bibliographic database. We will investigate and demonstrate the capabilities of Xpath, Xquery, and XSLT as standard query languages for XML-based databases. We will also show efficient implementations of recursive queries in XSLT. Key Words: Datalog, FLOWR, Xpath, XSLT. 1.0 Introduction Typically, a relational query involves retrieving data from one or more tables by applying certain conditions on the retrieved record. Operations such as joins can be performed to relate data in different tables. Logic-based languages such as SQL do not allow recursion or iteration over tables. Consequently, recursive queries similar to graph traversal need to be done outside the query language. Extensible Markup Language (XML) is a standard for creating custom markup in documents. Its primary purpose is to create a standardized representation of data, which can be accessed across diverse platforms. XML is very flexible and allows the user to create custom structures. It is an open standard and can be used free of cost. An XML document can be queried using technologies such as XPath, XQuery, and XSLT. XPath is a language for selecting nodes from an XML document based on certain conditions. It can also be used for computing values from the retrieved nodes. The syntax is similar a to file path in UNIX. XPath is very limited in its scope and does not have the expressive power for implementing all relational queries. XQuery is a querying language which can be used to extract data from XML documents. Its semantics are similar to SQL. Most queries which can be implemented in SQL for a relational database can be implemented in XQuery for an equivalent XML database. XQuery has so-called FLOWR expressive power. A FLOWR expression is made up of five clauses - For, Let, Where, Order by, Return. Since the queries are constrained by the FLOWR structure, it is not possible to achieve the full power of recursion in XQuery. Extensible Style sheet Language Transformation (XSLT) is a language used for converting XML documents from one form to another. XSLT is a complete language based on the functional programming paradigm. XSLT is considered to be a template processor. Using defined templates, one can express elegant recursive queries in XSLT. 2.0 Bibliographic Database Schema The purpose of this work is to show a simple implementation of a bibliographic database and a set of associated queries using the XML schema and the above mentioned query languages. A bibliographic database is a database containing bibliographic records. It is designed with the intent of capturing the bibliographic information. It holds information about the material, organized into their respective material type 2009 Sixth International Conference on Information Technology: New Generations Unrecognized Copyright Information DOI 10.1109/ITNG.2009.336 1073 2009 Sixth International Conference on Information Technology: New Generations 978-0-7695-3596-8/09 $25.00 © 2009 IEEE DOI 10.1109/ITNG.2009.336 1073

Upload: kirankumar

Post on 17-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

XML Based Implementation of a Bibliographic Database and Recursive Queries

Kazem Taghva Kirankumar JayakumarInformation Science Research Institute

University of Nevada, Las [email protected]

AbstractThe Structured Query Language (SQL) of relational database models does not have the expressive power to implement recursive queries. Consequently, recursive queries are implemented as an application program in the host language. The newly developed XML schema provides a different setting for database design and query implementation. In this paper, we design and implement an XML schema and a set of associated queries for a bibliographic database. We will investigate and demonstrate the capabilities of Xpath, Xquery, and XSLT as standard query languages for XML-based databases. We will also show efficient implementations of recursive queries in XSLT.

Key Words: Datalog, FLOWR, Xpath, XSLT.

1.0 Introduction

Typically, a relational query involves retrieving data from one or more tables by applying certain conditions on the retrieved record. Operations such as joins can be performed to relate data in different tables. Logic-based languages such as SQL do not allow recursion or iteration over tables. Consequently, recursive queries similar to graph traversal need to be done outside the query language.

Extensible Markup Language (XML) is a standard for creating custom markup in documents. Its primary purpose is to create a standardized representation of data, which can be accessed across diverse platforms. XML is very flexible and allows the user to create custom structures. It is an open standard and can be used free of cost. An XML document can be queried

using technologies such as XPath, XQuery, and XSLT.

XPath is a language for selecting nodes from an XML document based on certain conditions. It can also be used for computing values from the retrieved nodes. The syntax is similar a to file path in UNIX. XPath is very limited in its scope and does not have the expressive power for implementing all relational queries.

XQuery is a querying language which can be used to extract data from XML documents. Its semantics are similar to SQL. Most queries which can be implemented in SQL for a relational database can be implemented in XQuery for an equivalent XML database. XQuery has so-called FLOWR expressive power. A FLOWR expression is made up of five clauses - For, Let, Where, Order by, Return. Since the queries are constrained by the FLOWR structure, it is not possible to achieve the full power of recursion in XQuery.

Extensible Style sheet Language Transformation (XSLT) is a language used for converting XML documents from one form to another. XSLT is acomplete language based on the functional programming paradigm. XSLT is considered to be a template processor. Using defined templates, one can express elegant recursive queries in XSLT.

2.0 Bibliographic Database Schema

The purpose of this work is to show a simple implementation of a bibliographic database and a set of associated queries using the XML schema and the above mentioned query languages. A bibliographic database is a database containing bibliographic records. It is designed with the intent of capturing the bibliographic information. It holds information about the material, organized into their respective material type

2009 Sixth International Conference on Information Technology: New Generations

Unrecognized Copyright Information

DOI 10.1109/ITNG.2009.336

1073

2009 Sixth International Conference on Information Technology: New Generations

978-0-7695-3596-8/09 $25.00 © 2009 IEEE

DOI 10.1109/ITNG.2009.336

1073

Page 2: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

categories such as Articles, Books, Conferences, Proceedings, Journals, etc. All the reference information is also captured. For example,

bibliographic information related to a journal is stored in the journal element as shown in Table 2.1.

Table 2.1

Element Type Description

<JournalID> ST_PKID Primary key<Title> ST_genstring<Volume> ST_genstring<Number> ST_gennumber<Pages> ST_genstring<MonthYearID> ST_PKID Foreign key (referencing

<MonthYear>)<Note> ST_genstring<PublisherID> ST_PKID Foreign key (referencing

<Publisher>)<Edition> ST_genstring

The simple XML types ST_PKID and ST_genstring are defined as a five digit integer and strings of up to 20 characters respectively. Similarly, XML objects are constructed to represent other categories such as Books, Proceedings, and Annual Reports.

The bibliographic reference relationship between materials can be tracked using the Relationship

element. The reference relationship is a graph and can be represented using a self referencing foreign key. This relation is useful in determining explicit and implicit references between articles. An XML object Relationships is composed of zero or more Relationships as described in Figure 2.2 and Table 2.3.

Table 2.3

Element Type Description

<ID> ST_PKID Primary key<NodeXPath> xsd:string XPath of a particular material<ParentID> ST_PKID Self referencing foreign key

10741074

Page 3: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

Figure 2.2

Figure 2.2

When a material X references a material Y, then X explicitly references Y. When X references Y and Y references Z, then X implicitly references Z. This definition is applicable recursively. We would like to implement a query to find all the references, both explicit and implicit, for a given material type. This query is a directed graph traversal problem where each material is a vertex and the relationship is a directed edge. In the next section, we present a set of queries and their associated implementations.

3.0 ImplementationQuerying in an XML database environment can range from extracting the content or attribute value of a particular node, to conditional comparison & selection of a set of nodes, to recursive operation of nodes based on a certain condition. In this work, the following approach was adopted.

• XPath will be used for locating a particular node or a set of nodes satisfying a common criteria within the XML tree

• XQuery will be used in conjunction with XPath for relational non-recursive queries

• XSLT will be used in conjunction with XPath for recursive queries

A simple example of an XPath query and its implementation is:How many pages are there in the journal with the title 'X' ?

/BibliographicDB/Journals/Journal[Title='X']/Pages

Here the path shows the navigation within the structure of the XML database subject to the condition within the bracket. XQuery is very similar to SQL in its expressive power and typically is used when XPath is not applicable. An example of XQuery and its implementation is:

<Relationships><Relationship>

<ID></ID><NodeXPath></NodeXPath><ParentID></ParentID>

</Relationship><Relationship>

<ID></ID><NodeXPath></NodeXPath><ParentID></ParentID>

</Relationship></Relationships>

<Relationships><Relationship>

<ID></ID><NodeXPath></NodeXPath><ParentID></ParentID>

</Relationship><Relationship>

<ID></ID><NodeXPath></NodeXPath><ParentID></ParentID>

</Relationship></Relationships>

10751075

Page 4: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

Display the journals with their corresponding publishers.

declare namespace p1 = "http://ISRI.edu/XML/BibliographicDB";let $src := doc("file:///C:/Paper/code/test/test_instance1.xml")for $journal in $src/p1:BibliographicDB/Journals/Journalfor $publisher in $src/p1:BibliographicDB/Publishers/Publisherwhere $journal/PublisherID = $publisher/PublisherIDreturn<Result> <Journal> {$journal/Title/text()} </Journal> <Publisher> {$publisher/PublisherName/text()} </Publisher></Result>

An example of a recursive query would be one that could find all the references, both explicit and implicit for Journal type. As mentioned previously, this is equivalent to graph traversal and cannot be implemented in XPath or XQuery. Consider the sample data shown below:

In this data, node B explicitly references nodes D and F and implicitly references the nodes E and G. Pseudocode for implementation of our recursive query is shown in the following algorithm:

ID NodeXPath ParentID1 A2 B 13 C 14 D 25 E 46 F 27 G 6

AlgorithmFunction: DFS(Node, Path)Step 1: Display the current node, which is an XPath to the material (books, articles, journals etc.)Step 2: Find all the materials which refer to the current node (children). Step 3: If there is more than one material thenStep 4: FOR each child nodeStep 5: If child node is not already in the path, then add child node to the Path variable

and call DFS with the child node and the new Path variableStep 6: End FORStep 7: End IF

10761076

Page 5: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

The DFS() function must be invoked with a start node and no value should be passed to the path

parameter. The corresponding XSLT implementation of this code is shown below:

<xsl:stylesheet version="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:p1="http://ISRI.edu/XML/BibliographicDB">

<xsl:template name="search"> <xsl:param name="node"/> <xsl:param name="table"/> <!-- add path variable to keep track of cycles --> <xsl:param name="path" select="concat('->',concat($node,'->'))"/> <!-- Display the current node --> <xsl:text> </xsl:text> <xsl:value-of select="$table/Relationship[ID=$node]/NodeXPath"/> <!-- depth first graph algorithm (eliminates cycles) --> <!-- recurse if the current node has children --> <xsl:if test="count($table/Relationship[ParentID=$node]) > 0"> <xsl:for-each select="$table/Relationship[ParentID=$node]/ID"> <!-- make sure that current text() is not already present in the path

(eliminate cycles) --> <xsl:if test="not(contains($path,concat(concat('->',text()),'->')))"> <xsl:call-template name="search"> <xsl:with-param name="node" select="text()"/> <xsl:with-param name="table" select="$table"/> <xsl:with-param name="path" select="concat(concat($path,text()),'->')"/> </xsl:call-template> </xsl:if> </xsl:for-each> </xsl:if> </xsl:template>

<xsl:template match="/"> <xsl:call-template name="search"> <xsl:with-param name="node"select="p1:BibliographicDB/Relationships/Relationship[NodeXPath='X']/ID"/> <xsl:with-param name="table" select="p1:BibliographicDB/Relationships"/> </xsl:call-template> <xsl:text> Completed</xsl:text></xsl:template></xsl:stylesheet>

10771077

Page 6: [IEEE 2009 Sixth International Conference on Information Technology: New Generations - Las Vegas, NV, USA (2009.04.27-2009.04.29)] 2009 Sixth International Conference on Information

4.0 Conclusion and Future Work

We have designed and implemented a bibliographic database in order to study various queries and their implementations. Of particular interest is the role of XPath, XQuery, and XSLT in our approach. Although XSLT is Turing complete and has the expressive power to implement recursive queries, it is not an easy environment for beginners to implement recursive queries. In the future, it is expected that XQuery will have more capabilities and may be able to implement recursive queries.

5.0 References:[1]. MSDN article on XML Schema Elements

(http://msdn2.microsoft.com/en-us/library/ms256067(VS.85).aspx)

[2]. Wikipedia article on XML (http://en.wikipedia.org/wiki/Xml)

[3]. Wikipedia article on XPath (http://en.wikipedia.org/wiki/XPath)

[4]. Wikipedia article on XQuery (http://en.wikipedia.org/wiki/XQuery)

[5]. Wikipedia article on XSLT (http://en.wikipedia.org/wiki/XSLT)

[6]. Wikipedia article on Relational Database (http://en.wikipedia.org/wiki/Relational_database)

[7]. Breadth First Traversal in XSLT (http://www.tkachenko.com/blog/archives/000268.html)

[8]. Recursion in XSLT (http://www.ibm.com/developerworks

/xml/library/x-tiploop.html)[9]. Extending XQuery for Grouping,

Duplicate Elimination and Outer Joins (http://www.idealliance.org/proceedings/xml04/papers/229/XQueryExtensionsFinal.html#S3.1)

[10]. Philip M. Lewis, Arthur Bernstein & Michael Kifer; Database and Transaction Processing. Addison-Wesley, 2002.

10781078