introduction to xquery resources: official url: short intros: amoeller/xml/querying

30
Introduction to XQuery Resources: Official URL: www.w3.org/TR/xquery Short intros: http://www.xml.com/pub/a/2002/10/16/xquer y.html www.brics.dk/~amoeller/XML/querying Or see Ramakrishnan & Gehrke text Lecture modified from slides by Dan Suciu

Upload: gwen-gardner

Post on 27-Dec-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Introduction to XQuery

Resources:Official URL: www.w3.org/TR/xquery

Short intros: http://www.xml.com/pub/a/2002/10/16/xquery.html

www.brics.dk/~amoeller/XML/querying

Or see Ramakrishnan & Gehrke text

Lecture modified from slides by Dan Suciu

XML vs. Relational Data

{ row: { name: “John”, phone: 3634 },

row: { name: “Sue”, phone: 6343 },

row: { name: “Dick”, phone: 6363 }

}

name phone

John 3634

Sue 6343

Dick 6363

row row row

name name name

phone phone phone

“John” 3634 “Sue” “Dick”6343 6363

Relation … in XML

Relational to XML Data

• A relation instance is basically a tree with:– Unbounded fanout at level 1 (i.e., any # of rows)– Fixed fanout at level 2 (i.e., fixed # fields)

• XML data is essentially an arbitrary tree– Unbounded fanout at all nodes/levels– Any number of levels– Variable # of children at different nodes, variable

path lengths

Query Language for XML• Must be high-level; “SQL for XML”• Must conform to XSchema

– But also work in absence of schema info

• Support simple and complex/nested datatypes• Support universal and existential quantifiers,

aggregation• Operations on sequences and hierarchies of doc

structures• Capability to transform and create XML structures

XQuery

• Influenced by XML-QL, Lorel, Quilt, YATL– Also, XPath and XML Schema

• Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values– Inputs/outputs are objects defined by XML-

Query data model, rather than strings in XML syntax

Overview of XQuery• Path expressions• Element constructors• FLWOR (“flower”) expressions

– Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc.

• Expressions evaluated w.r.t. a context:– Context item (current node)– Context position (in sequence being processed)– Context size (of the sequence being processed)– Context also includes namespaces, variables, functions,

date, etc.

Path Expressions

Examples:

• Bib/paper

• Bib/book/publisher

• Bib/paper/author/lastname

Given an XML document, the value of a path expression p is a set of objects

Path Expression Examples

Doc =

&o1

&o12 &o24 &o29

&o43

&o70 &o71

&96

&243 &206

&25

“Serge”“Abiteboul”

1997

“Victor”“Vianu”

122 133

paper bookpaper

references

references references

authortitle

yearhttp

author

authorauthor

title publisherauthor

authortitle

page

firstname lastnamefirstname

lastnamefirst last

Bib

&o44 &o45 &o46

&o47 &o48 &o49 &o50 &o51

&o52

Bib/paper = <&o12,&o29>

Bib/book/publisher = <&o51>

Bib/paper/author/lastname = <&o71,&206>

Bib/paper = <&o12,&o29>

Bib/book/publisher = <&o51>

Bib/paper/author/lastname = <&o71,&206>

Note that order of elements matters!

Element Construction

• An XQuery expression can construct new values or structures

• Example: Consider the path expressions from the previous slide.– Each of them returns a newly constructed

sequence of elements– Key point is that we don’t just return existing

structures or atomic values; we can re-arrange them as we wish into new structures

FLWOR Expressions

• FOR-LET-WHERE-ORDERBY-RETURN = FLWOR

FOR / LET Clauses

WHERE Clause

ORDERBY/RETURN Clause

List of tuples

List of tuples

Instance of XQuery data model

FOR vs. LET

• FOR $x IN list-expr – Binds $x in turn to each value in the list expr

• LET $x = list-expr – Binds $x to the entire list expr– Useful for common sub-expressions and for

aggregations

FOR vs. LET: Example

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Notice that result hasseveral elements

Notice that result hasexactly one element

XQuery Example 1

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

XQuery Example 2For each author of a book by Morgan

Kaufmann, list all books she published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

distinct = a function that eliminates duplicates (after converting inputs to atomic values)

Results for Example 2

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Observe how nested structure of result elements is determined by the nested structure of the query.

XQuery Example 3

count = (aggregate) function that returns the number of elements

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

For each publisher p

- Let the list of books published by p be b

Count the # books in b, and return p if b > 100

XQuery Example 4

Find books whose price is larger than average:

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

Collections in XQuery• Ordered and unordered collections

– /bib/book/author = an ordered collection

– Distinct(/bib/book/author) = an unordered collection

• Examples:– LET $a = /bib/book $a is a collection; stmt iterates

over all books in collecion

– $b/author also a collection (several authors...)

RETURN <result> $b/author </result>RETURN <result> $b/author </result>

Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author> ... </result>

However:

Collections in XQuery

What about collections in expressions ?

• $b/price list of n prices

• $b/price * 0.7 list of n numbers??• $b/price * $b/quantity list of n x m numbers ??

– Valid only if the two sequences have at most one element– Atomization

• $book1/author eq "Kennedy" - Value Comparison• $book1/author = "Kennedy" - General Comparison

Sorting in XQuery

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)

ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]

ORDERBY $b/price DESCENDING RETURN <book>

$b/title , $b/price </book> </publisher></publisher_list>

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)

ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]

ORDERBY $b/price DESCENDING RETURN <book>

$b/title , $b/price </book> </publisher></publisher_list>

Conditional Expressions: If-Then-Else

FOR $h IN //holding

ORDERBY $h/titleRETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding>

FOR $h IN //holding

ORDERBY $h/titleRETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding>

Existential Quantifiers

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

Universal Quantifiers

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

Other Stuff in XQuery• Before and After

– for dealing with order in the input

• Filter– deletes some edges in the result tree

• Recursive functions• Namespaces• References, links …• Lots more stuff …

AppendixXML Schema and

XQuery Data Model

XML Schema

• Includes primitive data types (integers, strings, dates, etc.)

• Supports value-based constraints (integers > 100)

• User-definable structured types

• Inheritance (extension or restriction)

• Foreign keys

• Element-type reference constraints

Sample XML Schema<schema version=“1.0”

xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>

XML-Query Data Model• Describes XML data as a tree• Node ::= DocNode |

ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode

http://www.w3.org/TR/query-datamodel/2/2001

XML-Query Data ModelElement node (simplified definition):

• elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode

• QNameValue = means “a tag name”Reads: “Give me a tag, a set of attributes, a list of

elements/values, and I will return an element”

XML Query Data Model

Example:

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])

price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…

book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])

price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…