xml, xml schema, xpath and xquery slides collated from various sources, many from dan suciu at univ....
Post on 21-Dec-2015
214 views
TRANSCRIPT
XML, XML Schema, Xpath and Xquery
Slides collated from various sources, many from Dan Suciu at Univ. of
Washington
CS561 - Spring 2004. 2
XML
W3C standard to complement HTML
• origins: structured text SGML
• motivation:– HTML describes presentation– XML describes content
• • http://www.w3.org/TR/2000/REC-xml-20001006 (version
2, 10/2000)
SGMLXMLHTML4.0
CS561 - Spring 2004. 3
From HTML to XML
HTML describes the presentation
CS561 - Spring 2004. 4
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
CS561 - Spring 2004. 5
XML<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>XML describes the content
CS561 - Spring 2004. 6
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
CS561 - Spring 2004. 7
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
CS561 - Spring 2004. 8
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
CS561 - Spring 2004. 10
XML Namespaces
• http://www.w3.org/TR/REC-xml-names (1/99)
• name ::= [prefix:]localpart
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
CS561 - Spring 2004. 11
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
defined here
CS561 - Spring 2004. 13
XML Schemas
• http://www.w3.org/TR/xmlschema-1/10/2000
• generalizes DTDs• uses XML syntax• two documents: structure and datatypes
– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2
• XML-Schema is complex
CS561 - Spring 2004. 14
XML Schemas
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>
DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>
CS561 - Spring 2004. 15
Elements v.s. Types in XML Schema
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
DTD: <!ELEMENT person (name,address)>
CS561 - Spring 2004. 16
• Types:– Simple types (integers, strings, ...)
– Complex types (regular expressions, like in DTDs)
• Element-type-element alternation:– Root element has a complex type
– That type is a regular expression of elements
– Those elements have their complex types...
– ...
– On the leaves we have simple types
Elements v.s. Types in XML Schema
CS561 - Spring 2004. 17
Local and Global Types in XML Schema
• Local type: <xsd:element name=“person”>
[define locally the person’s type] </xsd:element>
• Global type: <xsd:element name=“person” type=“ttt”/>
<xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType>
Global types: can be reused in other elements
CS561 - Spring 2004. 18
Local v.s. Global Elements inXML Schema
• Local element: <xsd:complexType name=“ttt”>
<xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType>
• Global element: <xsd:element name=“address” type=“...”/>
<xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType>
Global elements: like in DTDs
CS561 - Spring 2004. 19
Regular Expressions in XML Schema
Recall the element-type-element alternation: <xsd:complexType name=“....”>
[regular expression on elements] </xsd:complexType>
Regular expressions:• <xsd:sequence> A B C </...> = A B C
• <xsd:choice> A B C </...> = A | B | C
• <xsd:group> A B C </...> = (A B C)
• <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?
CS561 - Spring 2004. 20
Attributes in XML Schema
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.
CS561 - Spring 2004. 21
“Mixed” Content, “Any” Type
• Better than in DTDs: can still enforce the type, but now may have text between any elements
• Means anything is permitted there
<xsd:complexType mixed="true"> . . . .
<xsd:complexType mixed="true"> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
CS561 - Spring 2004. 22
Derived Types by Extensions <complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Corresponds to inheritance
CS561 - Spring 2004. 23
Derived Types by Restrictions
• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
Corresponds to set inclusion
CS561 - Spring 2004. 24
Keys in XML Schema<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
XML:
XML Schema:
CS561 - Spring 2004. 25
Keys in XML Schema
• In general, two flavors:
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
Note: all Xpath expressions “start” at the element currently being definedThe fields must identify a single node
CS561 - Spring 2004. 26
Keys in XML Schema
• Unique = guarantees uniqueness
• Key = guarantees uniqueness and existence
• All Xpath expressions are “restricted”:– /a/b | /a/c OK for selector”– //a/b/*/c OK for field
• Note: better than DTD’s ID mechanism
CS561 - Spring 2004. 27
Keys in XML Schema
• Examples<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
Recall: must haveA single forename,Single surname
CS561 - Spring 2004. 28
Foreign Keys in XML Schema
• Example
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
XPATH
CS561 - Spring 2004. 30
XPath• Goal = permit to access some nodes from document• XPath main construct : axis navigation• XPath path consists of one or more navigation steps,
separated by /• Navigation step : axis + node-test + predicates• Examples
– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle =“XML”][2]
• XPath also offers shortcuts– no axis means child– // /descendant-or-self::node()/
CS561 - Spring 2004. 31
XPath- Child axis navigation• author is shorthand for child::author. Examples:
– aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)
– . -- the context node– / -- the root node
aaa
bbb
ccc aaa
aaa bbb ccc
1 2 3
4 5 6 7
context node
CS561 - Spring 2004. 32
XPath- child axis navigation
– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node
(equivalent to aaa)– text() -- all the text children of the context node– node() -- all the children of the context node
(includes text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //text() -- all the text nodes in the document
CS561 - Spring 2004. 33
Predicates
– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context node– [last()] -- the last child node of the context node– chapter[title=“introduction”] -- the chapter children of
the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes)
– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”
CS561 - Spring 2004. 34
Axis navigation• So far, nearly all our expressions have moved us down by
moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node
• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others
describe sequences of nodes.
CS561 - Spring 2004. 35
XPath Navigation Axesancestor
descendant
followingpreceding
following-siblingpreceding-sibling
child
attribute
namespace
self
CS561 - Spring 2004. 36
XPath abbreviated syntax
(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)
Query Languages - XQuery
CS561 - Spring 2004. 52
Summary of XQuery
• FLWR expressions• FOR and LET expressions• Collections and sorting
ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/
CS561 - Spring 2004. 53
XQuery
• Based on Quilt (which is based on XML-QL)
• http://www.w3.org/TR/xquery/2/2001
• XML Query data model (ordered)
CS561 - Spring 2004. 54
FLWR (“Flower”) Expressions
FOR ... LET... FOR... LET...
WHERE...
RETURN...
CS561 - Spring 2004. 55
XQuery
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
CS561 - Spring 2004. 56
XQueryFor each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates
CS561 - Spring 2004. 57
XQuery
Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
CS561 - Spring 2004. 58
XQuery
• FOR $x in expr -- binds $x to each element in the list expr
• LET $x = expr -- binds $x to the entire list expr– Useful for common subexpressions and for
aggregations
CS561 - Spring 2004. 59
XQuery
count = a (aggregate) function that returns the number of elms
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
CS561 - Spring 2004. 60
XQuery
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
CS561 - Spring 2004. 61
XQuery
Summary:
• FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
CS561 - Spring 2004. 62
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
CS561 - Spring 2004. 63
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
CS561 - Spring 2004. 64
Collections in XQuery
• Ordered and unordered collections– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
CS561 - Spring 2004. 65
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
CS561 - Spring 2004. 66
Sorting in XQuery
• Sorting arguments: refer to name space of RETURN clause, not FOR clause
• To sort on an element you don’t want to display, first return it, then remove it with an additional query.
CS561 - Spring 2004. 67
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
CS561 - Spring 2004. 68
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
CS561 - Spring 2004. 69
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title