processing of structured documents part 3. 2 xml schema (continues…) zbuilding content models…...

51
Processing of structured documents Part 3

Upload: shonda-king

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

3 Nested choice and sequence groups

TRANSCRIPT

Page 1: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

Processing of structured documents

Part 3

Page 2: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

2

XML Schema (continues…)

Building content models…

a simplified view of the allowed structure of a complex type complexType -> annotations?, (simpleContent

| complexContent | ((all | choice | sequence | group)? , attrDecls))

Page 3: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

3

Nested choice and sequence groups

<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:choice> <xsd:group ref=”shipAndBill” /> <xsd:element name=”singleUSAddress” type=”USAddress” /> </xsd:choice> <xsd:element name=”items” type=”Items” /> </xsd:sequence>

Page 4: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

4

Nested choice and sequence groups

<xsd:group name=”shipAndBill”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> </xsd:sequence></xsd:group>

Page 5: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

5

An ’all’ group

An all group: all the elements in the group may appear once or not at all, and they may appear in any order minOccurrs and maxOccurs can be 0 or 1

limited to the top-level of any content modelhas to be the only child at the topgroup’s children must all be individual

elements (no groups), and no element in the content model may appear more than once

Page 6: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

6

An ’all’ group

<xsd:complexType name=”PurchaseOrderType”> <xsd:all> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:all> <xsd:attribute name=”orderDate” type=”xsd:date” /> </xsd:complexType>

Page 7: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

7

Occurrence constraintsGroups represented by ’group’, ’choice’,

’sequence’ and ’all’ may carry minOccurs and maxOccurs attributes

by combining and nesting the various groups, and by setting the values of minOccurs and maxOccurs, it is possible to represent any content model expressible with an XML 1.0 DTD ’all’ group provides additional expressive power

Page 8: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

8

Attribute groupsAlso attribute definitions can be grouped

and named<xsd:element name=”item” > <xsd:complexType> <xsd:sequence> … </xsd:sequence> <xsd:attributeGroup ref=”ItemDelivery” /> </xsd:complexType></xsd:element>

<xsd:attributeGroup name=”ItemDelivery”> <xsd:attribute name=”partNum” type=”SKU” /> …</xsd:attributeGroup>

Page 9: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

9

XML Path Language (XPath)

The ability to navigate through XML documents is needed in many applications of XML querying of XML documents creation of hypertext links to objects

that do not have unique identifiers formatting of document components for

presentation

Page 10: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

10

XML Path Language (XPath)

XPath provides common syntax and semantics to address

parts of an XML document basic facilities for manipulation of strings,

numbers and booleans XPath uses a compact, non-XML syntax to

facilitate use of XPath within URIs and XML attribute values

Page 11: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

11

XML Path Language (XPath)

Use e.g. as a pattern in XSLT:

<xsl:template match=”chapter/title”>

</xsl:template>

Page 12: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

12

XML Path Language (XPath)

XPath operates on an XML document as a tree

every element in an XML document has a specific and unique contextual location any element in the document can be

identified by the steps it would take to reach it, either from the root element, or from some other fixed starting location

Page 13: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

13

Data model of XPath

A conceptual model: no particular implementation is assumed

A tree contains nodes (7 types): root nodes element nodes text nodes attribute nodes namespace nodes processing instruction nodes comment nodes

Page 14: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

14

Data model

Every node has a string-valuedocument order defined on all the nodes in

the document: root node is the first node element nodes in order of the occurrence of their

start tags attribute nodes and namespace nodes before the

children of the element namespace nodes before attribute nodes

parent - child, ancestor - descendant

Page 15: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

15

Root node

The root of the treethe element node for the document

element is a child of the root nodeother children:

processing instruction nodes comment nodes

string-value: concatenation of the string-values of all text node descendants of the root node in document order

Page 16: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

16

Element nodesAn element node for every element in the documentchildren:

element nodes (subelements) comment nodes processing instruction nodes text nodes (content)

entity references are expandedstring-value:

concatenation of the string-values of all text node descendants of the element node in document order

Page 17: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

17

Attribute nodesEach element node has an associated set

of attribute nodes the element node is the parent of each of

these attribute nodes but: an attribute node is not a child of its

parent elementa defaulted attribute is treated the same

as a specified attribute

Page 18: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

18

Attribute nodes

if an attribute was declared for the element with the default #IMPLIED, but the attribute was not specified on the element, there is no attribute node for this attribute

String-value: the normalized value as specified by the XML specification

Page 19: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

19

Namespace nodes

Each element has an associated set of namespace nodes one for each distinct namespace prefix that is in

scope for the element one for the default namespace if one is in scope

for the elementThe element is the parent of each of these

namespace nodes, but a namespace node is not a child of its parent element

string-value: the namespace URI

Page 20: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

20

PI nodes, comment nodes

There is a processing instruction node for every processing instruction

there is a comment node for every comment string-value: the content of the

comment not including <!-- and -->… except for PIs and comments in

document type declarations

Page 21: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

21

Text nodes

Character data is grouped into text nodesas much character data as possible is

grouped into each text nodestring-value: the character datacharacters inside comments, processing

instructions and attribute values do not produce text nodes

Page 22: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

22

Expressions

The primary syntactic construct in XPath is the expression

an expression is evaluated to yield an object, which has one of the following types node-set (unordered) boolean (true or false) number string

Page 23: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

23

Location pathsrelative location paths

a path that starts from an existing location sequence of one or more location steps separated by / steps are composed from left to right the initial step selects a set of nodes relative to the

context node each node in this set is used as a context node for the

following step the sets of nodes identified by that step are unioned

together e.g. child:div/child:para

Page 24: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

24

Location paths

An absolute location path consists of / optionally followed by a relative location path

A / by itself selects the root node of the document

if / is followed by a relative path, then the location path selects the set of nodes that would be selected by the relative location path relative to the root node

Page 25: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

25

Location stepsA location step has three parts

an axis: the tree relationship between the nodes selected by the location step and the context node

a node test: the node type and name of the nodes selected by the location step

zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step

syntax: axis::node-test[expr][expr]… e.g. child::para[position()=1]

Page 26: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

26

Location stepsThe node-set selected by the location step is the

node-set that results from generating an initial node-set from the axis and node-

test, and then filtering that node-set by each of the predicates in

turnthe initial node-set consists of the nodes

having the relationship to the context node specified by the axis, and

having the node type and name specified by the node test

Page 27: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

27

Axeschilddescendant parentancestorfollowing-sibling

empty, if the context node is an attribute node or namespace node

preceding-sibling empty, if the context node is an attribute node or

namespace node

Page 28: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

28

Axes

following all nodes in the same document as the context

node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes

preceding all nodes in the same document as the context

node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes

Page 29: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

29

Axes

attribute attribute nodes of the context node empty unless the context node is an element

namespace namespace nodes of the context node empty unless the context node is an element

self the context node itself

descendant-or-self, ancestor-or-self

Page 30: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

30

Axes

The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document

Page 31: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

31

Node tests

Every axis has a principal node type for the attribute axis: attribute for the namespace axis: name space for other axes: element

a node test both name and type have to match child::para

selects the para element children of the context nodeif the context node has no para children, it will select an

empty set of nodes

Page 32: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

32

Node tests

Function node() represents any nodefunction text(), comment(), and

processing-instruction() represent any object of these specific types

Page 33: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

33

Node testsA node test * is true for any node of the principal

node type child::*

selects all element children of the context node attribute::*

selects all attributes of the context nodetext()

true for any text nodecomment()processing-instruction()

may have an argument = name of the PI

Page 34: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

34

Abbreviated syntax

child:: -> can be omitted from a location step; child is the default axis child::div/child::para -> div/para

attribute:: -> @ child::para[attribute::type=”warning”] ->

para[@type=”warning”]/descendant-or-self::node()/ -> //

//para selects any para element in the document div//para selects all para descendants of div

children (of the context node)

Page 35: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

35

Abbreviated syntax

self::node() -> . (fullstop) .//para selects all para descendant elements

of the context nodeparent::node() -> ..

../title selects the title children of the parent of the context node

Page 36: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

36

Predicates

An axis is either a forward axis or a reverse axis forward axis: an axis that only ever contains

the context node or nodes that are after the context node in document order

reverse axis: an axis that only ever contains the context node or nodes that are before the context node in document order

Page 37: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

37

Predicatesthe proximity position of a member of a node-set

with respect to an axis: the position of the node in the node-set ordered in

document order if the axis is a forward axisreverse order if the axis is a reverse axis

the first position is 1a predicate filters a node-set to produce a new

node-set for each node in the node-set, the predicate expression

is evaluated with that node as the context node and with the proximity position of the node in the node-set

Page 38: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

38

Predicates

If the predicate expression evaluates to true for that node, the node is included in the new node-set

the result of the evaluation is converted to a boolean if the result is a number, the result is true if the

number is equal to the context position otherwise, the result will be converted as if by a

call to the function boolean (see below) e.g. para[3] equals para[position()=3]

Page 39: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

39

PredicatesContained element tests

the name of an element can appear in a predicate filter -> represents an element that must be present as a child

note[title]a note element is only selected if it directly contains

a title element note[title=”first note”]

true, if the content of the element is ’first note’ note[id(”123”)]

Page 40: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

40

PredicatesAttribute tests

para[@type=’secret’]every ’para’ element with a ’type’ attribute

value of ’secret’

Page 41: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

41

Expressions

boolean operators: or, andcomparisons: =, !=, <=, <, >=, >

in XML documents: < has to be converted to &lt;

numeric operators: +, -, *, div, mod

Page 42: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

42

Core functions

Node set functions number last() number position() number count(node-set) node-set id(object)

e.g. id(”foo”) selects the element with unique ID foo

Page 43: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

43

Core functions

String functions string string(object?)

convert an object to a stringe.g. negative infinity -> -Infinity

string concat(string,string,string*)returns the concatenation of its arguments

boolean starts-with(string,string)returns true if the first argument string starts

with the second argument string

Page 44: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

44

Core functionsString functions

boolean contains(string,string)returns true if the first string contains the second

string string substring-before(string,string)

e.g. substring-before(”1999/04/01”,”/”) returns 1999 string substring-after(string,string) string substring(string,number,number?)

e.g. substring(”12345”,2,3) returns ”234”e.g. substring(”12345”,2) returns ”2345”

Page 45: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

45

Core functions

String functions number string-length(string?)

default: the string-value of the context node string normalize-space(string?)

returns the string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space

string translate(string,string,string)e.g. translate(”bar”,”abc”,”ABC”) returns BA r

Page 46: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

46

Core functionsBoolean functions

boolean boolean(object)convert the argument to a booleane.g. a number is true if and only if it is neither

positive or negative zero nor NaN (not-a-number)e.g. a node-set is true iff it is non-empty

boolean not(boolean) boolean true(), boolean false() boolean lang(string)

attribute xml:lang

Page 47: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

47

Core functions

Number functions number number(object?)

converts its argument to a numbere.g. boolean true -> 1; boolean false -> 0e.g. a string -> mathematical value or NaN

number sum(node-set) number floor(number), number

ceiling(number), number round(number)

Page 48: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

48

Examples para selects the para element children of the context

node * selects all element children text() selects all text node children @name selects the name attribute @* selects all the attributes para[1] selects the first para child para[last()] selects the last para child */para selects all para grandchildren /doc/chapter[5]/section[2] selects the second section of

the fifth chapter of the doc (root)

Page 49: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

49

Examples chapter//para selects the para element descendants

of the chapter element children //para selects all the para descendants of the

document root and thus selects all the para elements in the same document as the context node

//olist/item selects all the item elements in the same document as the context node that have an olist parent

. selects the context node .//para selects the para element descendants .. selects the parent ../@lang selects the lang attribute of the parent

Page 50: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

50

Examples para[@type=”warning”] selects all para

children of the context node that have a type attribute with value warning

para[@type=”warning”][5] selects the fifth para child of the context node that has a type attribute with value warning

para[5][@type=”warning”] selects the fifth para child of the context node if that child has a type attribute with value warning

Page 51: Processing of structured documents Part 3. 2 XML Schema (continues…) zBuilding content models… za simplified view of the allowed structure of a complex

51

Examples chapter[title=”Introduction”] selects the chapter

children of the context node that have one or more title children with string-value equal to Introduction

chapter[title] selects the chapter children of the context node that have one or more title children

employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and an assistant attribute