semistructured data and xml

72
SEMISTRUCTURED DATA AND XML

Upload: niabi

Post on 25-Feb-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Semistructured Data and XML. How the Web is Today. HTML documents often generated by applications consumed by humans only easy access: across platforms, across organizations only layout, no semantic information No application interoperability: HTML not understood by applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semistructured Data and XML

SEMISTRUCTURED DATA AND XML

Page 2: Semistructured Data and XML

2222

HOW THE WEB IS TODAY HTML documents

often generated by applications consumed by humans only easy access: across platforms, across

organizations only layout, no semantic information

No application interoperability: HTML not understood by applications

screen scraping brittle Database technology: client-server

still vendor specific

Page 3: Semistructured Data and XML

3333

XML DATA EXCHANGE FORMAT A standard from the W3C (World Wide Web

Consortium, http://www.w3.org). The mission of the W3C „. . . developing common protocols that

promote its evolution and ensure its interoperability. . .“.

Basic ideas XML = data XML generated by applications XML consumed by applications Easy access: across platforms, organizations.

Page 4: Semistructured Data and XML

4444

PARADIGM SHIFT ON THE WEB For web search engines:

From documents (HTML) to data (XML) From document management to document

understanding (e.g., question answering) From information retrieval to data management

For database systems: From relational (structured) model to

semistructured data From data processing to data /query translation From storage to transport

Page 5: Semistructured Data and XML

5555

THE SEMISTRUCTURED DATA MODEL

&o1

&o12 &o24 &o29

&o43&96

&243 &206

&25

“Serge” “Abiteboul”

1997

“Victor” “Vianu” 122 133

paper bookpaper

references

references references

author title year httpauthor

authorauthor

title publisherauthor

authortitle

page

firstnamelastname firstname lastname first

last

Bib

Object Exchange Model (OEM) complex object

atomic object

Page 6: Semistructured Data and XML

6666

THE SEMISTRUCTURED DATA MODEL

Data is self-describing, i.e. the data description is integrated with the data itself rather than in a separate schema.

Database is a collection of nodes and arcs (directed graph).

Leaf nodes represent data of some atomic type (atomic objects, such as numbers or strings).

Interior nodes represent complex objects consisting of components (child nodes), connected by arcs to this node.

Arcs are directed and connect two nodes.

Page 7: Semistructured Data and XML

7777

THE SEMISTRUCTURED DATA MODEL

Arc labels indicates the relationship between the two corresponding nodes.

The root node is the only interior node without in-arcs, representing the entire database.

All database objects are children of the root node.

Every node must be reachable from the root. A general graph structure is possible, i.e. the

graph need not be a tree structure.

Page 8: Semistructured Data and XML

8888

SYNTAX FOR SEMISTRUCTURED DATA

Bib: &o1 { paper: &o12 { … }, book: &o24 { … }, paper: &o29 { author: &o52 “Abiteboul”, author: &o96 { firstname: &243 “Victor”, lastname: &o206 “Vianu”}, title: &o93 “Regular path queries with

constraints”, references: &o12, references: &o24, pages: &o25 { first: &o64 122, last: &o92

133} } }

Observe: Nested tuples, set-values, oids!

Page 9: Semistructured Data and XML

9999

SYNTAX FOR SEMISTRUCTURED DATA

May omit oids: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } } }

Page 10: Semistructured Data and XML

10101010

VS. RELATIONAL MODEL Missing attributes Additional attributes Multiple attribute values (set-valued attributes) Objects as attribute values No global schema

only the first characteristics supported by relational model, all others are not

Page 11: Semistructured Data and XML

11111111

VS. RELATIONAL MODEL Semistructured data

Self-describing, Irregular data, No a-priori structure.

Relational DB Separate schema, Regular data, A-priori structure.

Page 12: Semistructured Data and XML

XML

Page 13: Semistructured Data and XML

13131313

IMPORTANT XML STANDARDS

XSL/XSLT: presentation and transformation standards

RDF: resource description framework (meta-info such as ratings, categorizations, etc.)

Xpath/Xpointer/Xlink: standard for linking to documents and elements within

Namespaces: for resolving name clashes DOM: Document Object Model for

manipulating XML documents SAX: Simple API for XML parsing XQuery: query language

Page 14: Semistructured Data and XML

14141414

XML A W3C standard to complement HTML Origins: Structured text SGML

Large-scale electronic publishing Data exchange on the web

Motivation: HTML describes presentation XML describes content

http://www.w3.org/TR/2000/REC-xml-20001006 (version 2, 10/2000)

SGMLXMLHTML4.0

Page 15: Semistructured Data and XML

15151515

FROM HTML TO XML

HTML describes the presentation

Page 16: Semistructured Data and XML

16161616

HTML

<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999

HTML describes the presentation

Page 17: Semistructured Data and XML

17171717

XML

<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley

</publisher> <year> 1995 </year> </book> …

</bibliography>

XML describes the content

Page 18: Semistructured Data and XML

18181818

WHY ARE WE DB’ERS INTERESTED? It’s data. That’s us.

Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XQuery) How are we going to store XML (in a relational

database? object-oriented? native?) How are we going to process XML efficiently?

(many interesting research questions!)

Page 19: Semistructured Data and XML

19191919

ELEMENTS Tags

book, title, author, … start tag: <book>, end tag: </book> defined by user / programmer (different from

HTML!) Elements

<book>…<book>,<author>…</author> An element consists of a matching start and end

tag and the enclosed content. Elements can be nested, i.e. content of one

element can consist of sequence of other elements.

Page 20: Semistructured Data and XML

20202020

ATTRIBUTES Attributes can be associated with any

element. Provide additional information about

elements. Attributes can have only one value. Example

<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>

Attributes can also be used to connect elements.

Page 21: Semistructured Data and XML

21212121

NON-TREE-LIKE XML So far: only tree-like XML documents,

i.e. each element is nested within at most one other element.

Attributes can also be used to create non-tree XML documents.

Attributes with a domain of ID serve as primary keys of elements.

Attributes with a domain of IDREF serve as foreign keys referencing the ID of another element.

Page 22: Semistructured Data and XML

22222222

NON-TREE-LIKE XMLExample of a non-tree structure<persons> <person personid=“o555”>

<name> Jane </name> </person> <person personid=“o456”> <name> Mary </name> <children refs=“o123 o555”</children > </person> <person personid=“o123” mother=“o456”> <name>John</name> </person></persons>

Page 23: Semistructured Data and XML

23232323

NAMESPACES An XML document can involve tags that

come for multiple sources. One and the same tag can appear in more

than one source.<table> <tr>

<td>Apples</td> <td>Bananas</td>

</tr> </table>

<table> <name>African Coffee Table</name> <width>80</width><length>120</length>

</table>

Page 24: Semistructured Data and XML

24242424

NAMESPACES Name conflicts can be resolved by prefixing

tag names according to their source.<h:table>

<h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr>

</h:table> <f:table>

<f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length>

</f:table> When using prefixes in XML, a namespace

for the prefix must be defined. The namespace must be referenced (via an

URI) in the start tag of an enclosing element .

Page 25: Semistructured Data and XML

25252525

WELL-FORMED XML A well-formed XML document satisfies the

following conditions: Begins with a declaration that it is XML. Has a single root element that encloses the whole

document. Consists of properly nested elements, i.e. start

and end tag of an element are within the same enclosing element.

standalone =“yes” states that document has no DTD.

In this mode, you can invent your own tags, like in semistructured data model.

Page 26: Semistructured Data and XML

26262626

WELL-FORMED XML

<?XML version=“1.0” standalone =“yes” ?><bibliography>

<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> <book> <title> … </title> . . . </book> …

</bibliography>

Page 27: Semistructured Data and XML

27272727

WELL-FORMED XML HTML browsers will display documents with

errors (like missing end tags). The W3C XML specification states that a

program should stop processing an XML document if it finds an error.

The main reason is that XML is being consumed by programs rather than by humans (as HTML).

W3C provides a validator that checks whether an XML document is well-formed.

Page 28: Semistructured Data and XML

28282828

VALID XML The validator can also check whether an XML

document is valid, i.e. conforms to a Document Type Definition (DTD).

A DTD specifies the allowable tags and how they can be nested.

XML with a DTD is no longer semistructured (self-describing).

However, a DTD is less rigid than the schema of a relational DB. E.g., a DTD allows missing and multiple attributes / elements.

Page 29: Semistructured Data and XML

DTD

Page 30: Semistructured Data and XML

30303030

DOCUMENT TYPE DEFINITIONS Document Type Definition (DTD): set of rules

(grammar) specifying elements, attributes and all other aspects of XML documents.

For each element, specify name and content type.

Content type can, e.g., be #PCDATA (character string), other elements, regular expression made of the above content

types* = zero or more occurrences? = zero or one occurrence+ = one or more occurrences, = sequence of elements.

Page 31: Semistructured Data and XML

31313131

DOCUMENT TYPE DESCRIPTORS Sort of like a schema but not really.

Inherited from SGML DTD standard BNF grammar establishing constraints on

element structure and content Definitions of entities

<!ELEMENT Book (title, author*) >

<!ELEMENT title #PCDATA> <!ELEMENT author (name, address,age?)>

<!ATTLIST Book id ID #REQUIRED> <!ATTLIST Book pub IDREF #IMPLIED>

Page 32: Semistructured Data and XML

32323232

EXAMPLE DTD: PRODUCT CATALOG<!DOCTYPE CATALOG [<!ELEMENT CATALOG (PRODUCT+)> <!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)><!ATTLIST PRODUCT NAME CDATA #IMPLIED CATEGORY (HandTool|Table|Shop-Professional) "HandTool" PARTNUM CDATA #IMPLIED PLANT (Pittsburgh|Milwaukee|Chicago) "Chicago" INVENTORY (InStock|Backordered|Discontinued) "InStock"> <!ELEMENT SPECIFICATIONS (#PCDATA)> <!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED POWER CDATA #IMPLIED> <!ELEMENT OPTIONS (#PCDATA)> <!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte" ADAPTER (Included|Optional|NotApplicable) "Included" CASE (HardShell|Soft|NotApplicable) "HardShell"> <!ELEMENT PRICE (#PCDATA)> <!ATTLIST PRICE MSRP CDATA #IMPLIED WHOLESALE CDATA #IMPLIED STREET CDATA #IMPLIED SHIPPING CDATA #IMPLIED> <!ELEMENT NOTES (#PCDATA)> ]>

Page 33: Semistructured Data and XML

33333333

SHORTCOMINGS OF DTDSUseful for documents, but not so good for data: Element name and type are associated

globally No support for structural re-use

Object-oriented-like structures aren’t supported No support for data types

Can’t do data validation Can have a single key item (ID), but:

No support for multi-attribute keys No support for foreign keys (references to other

keys) No constraints on IDREFs (reference only a Section)

Page 34: Semistructured Data and XML

XML SCHEMA

Page 35: Semistructured Data and XML

35353535

XML SCHEMA The successor of DTDs to specify a schema

for XML documents. A W3C standard. Includes and extends functionality of DTDs. In particular, XML Schemas support data

types. This makes it easier to validate the correctness of data and to work with data from a database.

XML Schemas are written in XML. You don't have to learn a new language and can use your XML parser to parse your Schema files.

Page 36: Semistructured Data and XML

36363636

EXAMPLE XML SCHEMA<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”>

<element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0”

maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0”

maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>

Page 37: Semistructured Data and XML

37373737

SIMPLE ELEMENTS Simple elements contain only text. They can have one of the built-in datatypes:

xs:string, xs:decimal, xs:integer, xs:booleanxs:date, xs:time.

Example<xs:element name="lastname“

type="xs:string"/><xs:element name="age" type="xs:integer"/> <xs:element name="dateborn"

type="xs:date"/>

Page 38: Semistructured Data and XML

38383838

SIMPLE ELEMENTS Restrictions allow you to further constrain the

content of simple elements.

<xs:element name="age"> <xs:simpleType>

<xs:restriction base="xs:integer">

<xs:minInclusive value="0"/> <xs:maxInclusive

value="120"/> </xs:restriction> </xs:simpleType> </xs:element>

Page 39: Semistructured Data and XML

39393939

ATTRIBUTES Attributes can be specified using the attribute

element:<xs:attribute name="xxx"

type="yyy"/> Attribute elements are nested within the

element of the element with which they are associated.

By default, attributes are optional. To make an attribute mandatory, use

<xs:attribute name="lang“ type="xs:string“use="required"/>

Attributes can have the same built-in datatypes as simple elements.

Page 40: Semistructured Data and XML

40404040

COMPLEX ELEMENTS Complex elements can contain other elements and

can have attributes. Nested elements need to occur in the order

specified. The number of repetitions of elements are

controlled by the attributes minOccurs and maxOccurs. The default is one repetition.

A complex element with an attribute:<xs:element name="product">

<xs:complexType> <xs:attribute name="prodid"

type="xs:positiveInteger"/> </xs:complexType> </xs:element>

Page 41: Semistructured Data and XML

41414141

COMPLEX ELEMENTS A complex element containing a sequence of

nested (simple) elements:

<xs:element name="employee"> <xs:complexType> <xs:sequence>

<xs:element name="firstname" type="xs:string"/>

<xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>

</xs:element>

Page 42: Semistructured Data and XML

42424242

COMPLEX ELEMENTS If you name the complex element, other

elements can reference and include it:

<xs:complexType name="persontype"> <xs:sequence>

<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence>

</xs:complexType>

<xs:element name="person" type="persontype"/>

Page 43: Semistructured Data and XML

43434343

EXAMPLE XML SCHEMA<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”>

<element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0”

maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0”

maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>

Page 44: Semistructured Data and XML

44444444

XML VS. SEMISTRUCTURED DATA Both described best by a graph. Both are schema-less, self-describing

(XML without DTD / XML schema). XML is ordered, semistructured data is not. XML can mix text and elements:

<talk> Making Java easier to type and easier to type

<speaker> Phil Wadler </speaker></talk>

XML has lots of other stuff: attributes, entities, processing instructions, comments.

Page 45: Semistructured Data and XML

XML-PATH = XPATH

Page 46: Semistructured Data and XML

46464646

QUERY LANGUAGES FOR XML XPath is a simple query language based on

describing similar paths in XML documents. XQuery extends XPath in a style similar to

SQL, introducing iterations, subqueries, etc. XPath and XQuery expressions are applied to

an XML document and return a sequence of qualifying items.

Items can be primitive values or nodes (elements, attributes, documents).

The items returned do not need to be of the same type.

Page 47: Semistructured Data and XML

47474747

XPATH A path expression returns the sequence of all

qualifying items that are reachable from the input item following the specified path.

A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”).

Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path.

Relative path expressions are applied to an arbitrary node.

Page 48: Semistructured Data and XML

48484848

XPATH<?XML version=“1.0” standalone =“yes” ?><bibliography>

<book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>…

</bibliography>

Applied to the above document, the XPath expression /bibliography/book/author returns the sequence

<author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> . . .

Page 49: Semistructured Data and XML

49494949

ATTRIBUTES If we do not want to return the qualifying elements, but the

value one of their attributes, we end the path expression with @attribute.

<?XML version=“1.0” standalone =“yes” ?><bibliography>

<book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>

the XPath expression /bibliography/book/@bookID

returns the sequence “b100“ . . .

Page 50: Semistructured Data and XML

50505050

WILDCARDS We can use wildcards instead of actual tags

and attributes:* means any tag, and @* means any attribute.

Examples /bibliography/*/author returns the

sequence <author> Abiteboul </author> <author> Hull </author>.

/bibliography//author/@* returns the sequence “IBM“

“a739“.

Page 51: Semistructured Data and XML

51515151

PATH EXPRESSIONS

Examples: Bib.paper Bib.book.publisher Bib.paper.author.lastname

Given an OEM instance, the value of a path expression p is a set of objects

Page 52: Semistructured Data and XML

52525252

PATH EXPRESSIONS

Examples:

DB =

&o1

&o12 &o24 &o29

&o43

&o70 &o71

&96

&243 &206

&25

“Serge” “Abiteboul”

1997

“Victor” “Vianu” 122 133

paper bookpaper

references

references references

authortitle year httpauthor

authorauthor

title publisherauthor

authortitle

page

firstnamelastname firstname lastname first

last

Bib

&o44 &o45 &o46

&o47 &o48 &o49 &o50 &o51

&o52

Bib.paper={&o12,&o29}Bib.book.publisher={&o51}Bib.paper.author.lastname={&o71,&206}

Page 53: Semistructured Data and XML

XML-QUERY = XQUERY

Page 54: Semistructured Data and XML

54545454

XQUERY

Summary: FOR-LET-WHERE-ORDERBY-RETURN = FLWOR

FOR/LET Clauses

WHERE Clause

ORDERBY/RETURN Clause

List of tuples

List of tuples

Instance of Xquery data model

Page 55: Semistructured Data and XML

55555555

XQUERY FLWOR expressions are similar to SQL

select . . from . . . where . . . queries. XQuery allows zero, one or more for and let

clauses. The where clause is optional. There is one optional order-by clause. Finally, there is exactly one return clause. XQuery is case-sensitive. XQuery (and XPath) is a W3C standard.

Page 56: Semistructured Data and XML

56565656

XQUERY CLAUSES for $x in expr

Defines node variable $x. The expression expr evaluates to a sequence of

items. The variable $x is assigned to each item, in turn,

and the body of the for clause is executed once for each assignment.

let $x := expr Defines collection variable $x. The expression expr evaluates to a sequence of

items. The variable is bound to the entire sequence of

items. Useful for common subexpressions and for

aggregations.

Page 57: Semistructured Data and XML

57575757

XQUERY CLAUSES where condition

The condition is a boolean expression. The clause is applied to some item. If and only if the condition evaluates to true, the

following return clause is executed for that item. return expression

The result of a FLWOR clause is a sequence of items.

Expression defines the result format for the current (qualifying) item.

The sequence of items produced by expression is appended to the sequence of items produced so far.

Page 58: Semistructured Data and XML

58585858

INTERPRETATION AS XQUERY XQuery expressions can be used wherever an

XML expression of any kind is permitted. Any text string is acceptable as content of a

tag or value of an attribute. If a string contains an XQuery expression that

should be evaluated, this substring must be surrounded by curly brackets {}.

Example

for $b in doc("bib.xml")/bibliography/book return <result id = {$b/@bookID}>{$b/title}</result>

Page 59: Semistructured Data and XML

59595959

FOR V.S. LET Find all books

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Page 60: Semistructured Data and XML

60606060

XQUERY

Find all book titles published after 1995:FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 61: Semistructured Data and XML

61616161

ORDERING THE QUERY RESULT The order-by clause allows you to order the

results of an XQuery expression.order-by list of expressions

The sort order is based on the value of the first expression. Ties are broken based on the value of the second (if necessary third etc.) expression.

By default, the order is ascending. A descending sort order can be specified

using descending.

Page 62: Semistructured Data and XML

62626262

ELIMINATION OF DUPLICATES The built-in function distinct-values eliminates

duplicates from a sequence of result items. In principle, it applies only to primitive

(atomic) types. It can also be applied to elements, but then it

will remove their tags, replacing them by quotes “”.

ExampleIf return $b/title produces

<title> aaa </title> <title> bbb </title> <title> aaa </title>

then distinct-values (return $b/title) produces

“aaa” “bbb”.

Page 63: Semistructured Data and XML

63636363

XQUERYFor each author of a book by Morgan Kaufmann,

list all books she published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

distinct = a function thateliminates duplicates

Result: <result>

<author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith

</author> <title> ghi </title> </result>

Page 64: Semistructured Data and XML

64646464

JOINS We can join two or more documents, by using

one variable for each of the documents . We let a variable range over the elements of

the corresponding document, within a for-clause.

Need to be careful when comparing elements for equality, since their equality is by element identity, not by element content.

Typically, we want to compare the element content.

The built-in function data(E) returns the content of an element E.

Page 65: Semistructured Data and XML

65656565

XQUERY

Find books whose price is larger than average:

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

Page 66: Semistructured Data and XML

66666666

SORTING IN XQUERY

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING

RETURN <book> $b/title , $b/price </book> </publisher></publisher_list>

Page 67: Semistructured Data and XML

67676767

IF-THEN-ELSE

FOR $h IN //holding ORDERBY $h/titleRETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding>

Page 68: Semistructured Data and XML

68686868

EXISTENTIAL QUANTIFIERS

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

Page 69: Semistructured Data and XML

69696969

QUANTIFICATION XQuery supports the existential and the

universal quantifier. Universal quantifier

every $v in expression1 satisfies expression 2

Existential quantifiersome $v in expression1 satisfies

expression 2 Expression1 evaluates to a sequence of

items, expression 2 is a boolean expression.

Page 70: Semistructured Data and XML

70707070

AGGREGATION XQuery provides built-in functions for the

standard aggregations such as SUM, MIN, COUNT and AVG.

They can be applied to any XQuery expression, i.e. to any sequence of items.

Example avg(doc("bib.xml")/bibliography/book/price)

count(doc("bib.xml")/bibliography/book/price)

Computes the average book price and the number of

books, resp.

Page 71: Semistructured Data and XML

71717171

XQUERY EXAMPLES Find books whose price is larger than the

average price.

Uses aggregate operator (avg), applied to the result of a path expression.

let $a:=avg(doc("bib.xml")/bibliography/book/price)

for $b in doc("bib.xml")/bibliography/book

where $b/price > $a

return $b

Page 72: Semistructured Data and XML

72727272

XQUERY EXAMPLES

Find title of books with a paragraph containing the terms “sailing” and “windsurfing”.

Uses existential quantifier (some) and string matching (contains).

for $b in doc("bib.xml")//book

where some $p in $b//para satisfies

contains($p, "sailing") and contains($p, "windsurfing")

return $b/title