query languages for xml

32
Fall 2001 CSE330 1 Query Languages for XML

Upload: milos

Post on 06-Jan-2016

45 views

Category:

Documents


2 download

DESCRIPTION

Query Languages for XML. Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL http://www.w3.org/TR/NOTE-xml-ql http://db.cis.upenn.edu/XML-QL/ XPATH (part of a query language) http: www.w3.org/TR/xpath XSLT http://www.w3.org/TR/xslt - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Query Languages for XML

Fall 2001CSE330 1

Query Languages for XML

Page 2: Query Languages for XML

Fall 2001CSE330 2

Why a query language? Extracting, Restructuring, Integration, Browsing…

XML-QL http://www.w3.org/TR/NOTE-xml-ql

http://db.cis.upenn.edu/XML-QL/

XPATH (part of a query language) http:www.w3.org/TR/xpath XSLT

http://www.w3.org/TR/xslt

http://www.mulberrytech.com/quickref/XSLTquickref.pdf

QUILT http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html

http://db.cis.upenn.edu/Kweelt/

Page 3: Query Languages for XML

Fall 2001CSE330 3

XML-QL (XML Query Language)

• W3C proposal, August 1998

• authors: – Mary Fernandez AT&T– Dana Florescu INRIA– Alon Levy Univ. of Washington– Dan Suciu AT&T– Alin Deutsch Univ. of Pennsylvania

Page 4: Query Languages for XML

Fall 2001CSE330 4

Address Book Revisited<addrBook>

<person SSN=“111-22-3333”>

<name> Caesar </name>

<greet> Caesar Imperator</greet>

<addr> The Capitol </addr>

<addr> Rome, OH 98765 </addr>

<tel> (321) 786 2543 </tel>

<fax> (321) 786 2543 <fax>

<tel> (321) 786 2543 </tel>

<email> [email protected]

</email>

</person>

</addrBook>

Page 5: Query Languages for XML

Fall 2001CSE330 5

XML-QL: Pattern Matching

Find Caesar’s e-mail address:

where <addrBook> <person> <name>Caesar</name> <email>$e</email> </person> </addrBook> in “http://db.cis.upenn.edu/~peter/address.xml”construct $e

<XML>[email protected]</XML>

Data Extraction

Page 6: Query Languages for XML

Fall 2001CSE330 6

XML-QL: Constructing New XML Data

Whom can we contact electronically?

where <addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://...”construct <e-contact> <who>$g</who> <where>$e</where> </e-contact>

<XML> <e-contact> <who>Caesar Imperator</who> <where>[email protected] </where> </e-contact> <e-contact> <who>Brutus</who> <where>[email protected] </where> </e-contact> ...</XML>

Data Restructuring

Page 7: Query Languages for XML

Fall 2001CSE330 7

XML-QL: Joins

Who of our contacts was involved in a movie?

where <addrBook> <person> <greet>$g</greet> <email>$e</email> </person> </addrBook> in “http://…address.xml” <movie><title>$t</> <character>$g</> </movie> in “http://www.imdb.com”construct <cine-contact> <who>$g</who> <movie>$t</movie> <where>$e</where> </cine-contact>

Page 8: Query Languages for XML

Fall 2001CSE330 8

XML-QL: Joins (cont’d)

<XML> <cine-contact> <who>Caesar Imperator</who> <where>[email protected]</where> <movie>Asterix and Cleopatra</movie> </cine-contact>

<cine-contact> <who>Dr. Strangelove</who> <where>[email protected]</where> <movie>Dr. Strangelove or How I Stopped ...</movie> </cine-contact>...</XML>

Data Integration

Page 9: Query Languages for XML

Fall 2001CSE330 9

XML-QL Data Model

• Directed, labeled graph

• Tags represented as edge labels

• Sets of attribute name-value pairs as node labels

• Two models: ordered and unordered

Page 10: Query Languages for XML

Fall 2001CSE330 10

XML-QL Data Model (cont’d)

<person SSN=“111-22-3333”>

<name> Caesar </name>

<greet> Caesar Imperator </greet>

<addr> The Capitol </addr>

<addr> Rome, OH 98765 </addr>

<tel> (321) 786 2543 </tel>

<fax> (321) 786 2543 <fax>

<tel> (321) 786 2543 </tel>

<email> [email protected]

</email>

</person>

person

name tel fax tel emailgreet

addr addr

SSN=“111-…”

Caesar

addrBook

Caesar Imperator

The Capitol Rome, OH

(321) 786 2543

Page 11: Query Languages for XML

Fall 2001CSE330 11

XML-QL Semantics: Variable Bindings

person

name tel fax tel emailgreet

addr addr

SSN=“111-…”

Caesar

addrBook

Caesar Imperator

The Capitol Rome, OH

(321) 786 2543

name tel fax tel emailgreet

addr addr

SSN=“111-…”

Stragelove

Dr. Strangelove

The Capitol Washington, DC

person

strangelov@[email protected]

where <addrBook> <person> <name>$n</> <email>$e</> </> </>

$n $eCaesar [email protected] [email protected]

Page 12: Query Languages for XML

Fall 2001CSE330 12

XML-QL Semantics: XML Output

$n $eCaesar [email protected] [email protected]

construct <e-contact> <who>$n</who> <where>$e</where> </e-contact>

XML

e-contact e-contact

who where who where

Caesar [email protected] Strangelove [email protected]

Page 13: Query Languages for XML

Fall 2001CSE330 13

Advanced XML-QL

Find tags of person subelements:

where <addrBook.person.$tag></> in “http://db.cis.upenn.edu/~peter/address.xml”construct <childOfPerson>$tag</>

Find all email addresses and fax numbers :

where <addrBook._*. (email | fax)>$eORf</> in “http://db.cis.upenn.edu/~peter/address.xml”construct <emailOrFax>$eORf</>

Schema browsing

Page 14: Query Languages for XML

Fall 2001CSE330 14

More Advanced XML-QL

Find attributes of person elements:

where <_*.person $attrName=$attrVal></> in “http://db.cis.upenn.edu/~peter/address.xml”construct <personAttribute> <name>$attrName</> <value>$attrVal</> </>

Schema browsing

Page 15: Query Languages for XML

Fall 2001CSE330 15

XPath• Reasonably widely adopted -- in XML-Schema and query

languages.• Neither more expressive nor less expressive than regular

path expressions (can’t do (ab)* )• Primary goal = to permit to access some nodes from a

given document• XPath main construct : axis navigation• An XPath path consists of one or more navigation steps,

separated by /• A navigation step is a triplet: axis + node-test + list of

predicates

• Examples– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle = “XML”]

[2]

• XPath also offers some shortcuts– no axis means child– // /descendant-or-self::node()/

Page 16: Query Languages for XML

Fall 2001CSE330 16

XPath- child axis navigation• author is shorthand for child::author. Examples:

– aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)

– . -- the context node– / -- the root node

aaa

bbb

ccc aaa

aaa bbb ccc

1 2 3

4 5 6 7

context node

Page 17: Query Languages for XML

Fall 2001CSE330 17

XPath- child axis navigation (cont)– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node

(equivalent to aaa)– text() -- all the text children of the context node– node() -- all the children of the context node

(includes text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //para -- all the para nodes in the document– //text() -- all the text nodes in the document– @font the font attribute node of the context node

Page 18: Query Languages for XML

Fall 2001CSE330 18

Predicates

– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context node– [last()] -- the last child node of the context node– chapter[title=“introduction”] -- the chapter children of the

context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes)

– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”

Page 19: Query Languages for XML

Fall 2001CSE330 19

Unions of Path Expressions

• employee | consultant -- the union of the employee and consultant nodes that are children of the context node

• For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed

• However person/node()[boolean(employee|consultant)] is allowed!!

• From the XPATH specification:– The boolean function converts its argument to a boolean as

follows:• a number is true if and only if it is neither positive or negative zero

nor NaN

• a node-set is true if and only if it is non-empty

• a string is true if and only if its length is non-zero

• an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type

Page 20: Query Languages for XML

Fall 2001CSE330 20

Axis navigation

• So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node

• All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis

• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others

describe sequences of nodes.

Page 21: Query Languages for XML

Fall 2001CSE330

XPath Navigation Axes(merci, Arnaud Sahuguet)

ancestor

descendant

followingpreceding

following-siblingpreceding-sibling

child

attribute

namespace

self

Page 22: Query Languages for XML

Fall 2001CSE330

XPath abbreviated syntax

(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)

Page 23: Query Languages for XML

Fall 2001CSE330 23

Quilt

proposed by Chamberlin, Robbie and Florescu

(from the authors’ slides)

• Leverage the most effective features of several existing and proposed query languages

• Design a small, clean, implementable language• Cover the functionality required by all the XML Query

use cases in a single language• Write queries that fit on a slide• Design a quilt, not a camel

Page 24: Query Languages for XML

Fall 2001CSE330 24

Quilt = XPath + “comprehension” syntax

• XML -QL

• Quilt

where <pattern> in <XML-expression> <pattern> in <XML-expression> … <condition>construct <expression>

bind variables

use variables

for x in <XPath-expression> y in <XPath-expression> …where <condition>return <expression>

bind variables

use variables

Page 25: Query Languages for XML

Fall 2001CSE330 25

Examples of Quilt(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt )

Relational data -- two DTDs:<?xml version="1.0" ?><!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)>]>

<?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)>]>

Page 26: Query Languages for XML

Fall 2001CSE330 26

The data

<items>

<item_tuple><itemno>1001</itemno><description>Red Bicycle</description><offered_by>U01</offered_by><start_date>1999-01-05</start_date><end_date>1999-01-20</end_date><reserve_price>40</reserve_price></item_tuple>

<item_tuple><itemno>1002</itemno><description>Motorcycle</description><offered_by>U02</offered_by><start_date>1999-02-11</start_date><end_date>1999-03-15</end_date><reserve_price>500</reserve_price></item_tuple>

</items>

<bids>

<bid_tuple><userid>U02</userid><itemno>1001</itemno><bid>35</bid><bid_date>99-01-07</bid_date></bid_tuple>

<bid_tuple><userid>U04</userid><itemno>1001</itemno><bid>40</bid><bid_date>99-01-08</bid_date></bid_tuple>

</bids>

Page 27: Query Languages for XML

Fall 2001CSE330 27

Query 1

FUNCTION date(){ "1999-02-01"}

<result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) )</result>

XPath expressionsin orange

simple function definitions

dates are formatted sothat lexicographic ordering gives the rightresult

Page 28: Query Languages for XML

Fall 2001CSE330 28

Output from Q1

<?xml version="1.0" ?><result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple></result>

Page 29: Query Languages for XML

Fall 2001CSE330 29

Query Q2

For all bicycles, list the item number, description, and highest bid (if any), ordered by item number.

<result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) )</result>

use of variablein Xpath

lots of coercion

Page 30: Query Languages for XML

Fall 2001CSE330 30

Output from Q2

<result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple></result>

Page 31: Query Languages for XML

Fall 2001CSE330 31

Query Q3

Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000.

<result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> )</result>

Comparing sets with singletonsSame rules as in XPath? In thiscase the DTD gives uniqueness

Page 32: Query Languages for XML

Fall 2001CSE330 32

Conclusions

• XML is a data format for which there are an increasing number of useful tools for– Constructing schemas– Programming– Querying

• Although it is likely that a query language will soon emerge as a standard, there is less agreement or understanding on how to store XML data efficiently.

• Many other database issues remain to make it useful for manipulating large amounts of data.