cis336 website design, implementation and management (also semester 2 of cis219, cis221 and it226)

31
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach, 2006, pp.113- 159) David Meredith [email protected] ww.titanmusic.com/teaching/cis336-2006-7.htm

Upload: misty

Post on 19-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Lecture 5 XML Schema (Based on Møller and Schwartzbach, 2006, pp.113-159). CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226). David Meredith [email protected] www.titanmusic.com/teaching/cis336-2006-7.html. Problems with DTDs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

1

CIS336Website design, implementation and management(also Semester 2 of CIS219, CIS221 and IT226)

Lecture 5XML Schema

(Based on Møller and Schwartzbach, 2006, pp.113-159)

David [email protected]

www.titanmusic.com/teaching/cis336-2006-7.html

Page 2: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

2

Problems with DTDs

• DTDs cannot constrain character data– e.g., cannot specify that (#PCDATA) must only be a valid integer

representation– need more powerful datatype mechanism

• Attribute types are too limited– e.g., cannot specify that an attribute value must be an integer, a URI

etc.• Element and attribute definitions cannot depend on context

– e.g., cannot specify that unit attribute only allowed if amount attribute is present

• Character data cannot be combined with regular expression content model– i.e., mixed content always has form (#PCDATA | e1 | e2)*

• cannot specify order in which character data may be interspersed with elements

• Element content model lacks "interleaving" operator that allows us to specify that an element may occur anywhere inside an element– e.g., cannot (easily) specify that comment element may occur

anywhere in contents of recipe element

Page 3: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

3

More problems with DTDs

• DTD provides very limited support for modularity, reuse and evolution of schemas– hard to write, maintain and read large DTD schemas

• ID/IDREF mechanism is too limited– sometimes want to specify a more restricted scope for

an ID attribute than the whole instance document– also might want to use multiple attribute values or

character data as keys rather than just single attribute value

• DTDs do not support namespaces

Page 4: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

4

XML Schema• DTDs defined as part of the XML 1.0 specification (February

1998)– inherited from SGML

• Shortly afterwards, W3C initiated XML Schema project to deal with problems in DTDs

• XML Schema Requirements (1999) specifies that XML Schema should be:– more expressive than XML DTD– a well-formed XML language– self-describing

• i.e., it should be possible to describe the syntax of XML Schema using an XML Schema (since XML Schema is an XML language)

– simple enough to implement with modest design and runtime resources (which limits expressiveness)

• XML Schema specification should be:– defined quickly to prevent competing schema languages gaining a

foothold– precise, concise, human-readable and illustrated with examples

Page 5: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

5

XML Schema technical requirements

• XML Schema should– contain mechanism for constraining use of

namespaces– allow creation of user-defined datatypes for

describing character data and attribute values– enable inheritance for element, attribute and

datatype definitions– support evolution of schemas– permit embedded structured documentation

within schemas

Page 6: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

6

XML Schema recommendation• Official XML Schema specification published as W3C

recommendation in 2001– in 2 parts:

• XML Schema Part 1: Structures– Describes core XML Schema including, for example, element and attribute

declarations– Most recent version: Second Edition, 28 October 2004– Available online at

http://www.w3.org/TR/xmlschema-1/• XML Schema Part 2: Datatypes

– Defines facilities for defining datatypes in XML Schema– Most recent version: Second Edition, 28 October 2004– Available online at

http://www.w3.org/TR/xmlschema-2/

• Does not satisfy all original requirements:– not simple

• Partly remedied by XML Schema Part 0: Primer– Provides easily readable description of the XML Schema facilities– Most recent version: 28 October 2004– Available online at

» http://www.w3.org/TR/xmlschema-0/– not fully self-describing– not sufficiently expressive

• e.g., cannot express full syntax of RecipeML

Page 7: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

7

XML Schema overview• Contains a sophisticated type system like those

in common programming languages– Facilitates re-use and improves schema structure

• Four central constructs in XML Schema all based on types and are as follows:– Simple type definition

• Defines a family of Unicode text strings• Describes text without markup

– Complex type definition• Defines validity requirements for attributes, sub-elements

and character data in an element of that type• Describes text which may contain markup

– Element declaration• Associates element name with either a simple or complex

type– Attribute declaration

• Associates attribute name with simple type– Attribute values are always unstructured text

Page 8: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

8

An example schema written in XML Schema• Schema at left shows

– one element declaration• student

– two attribute declarations:• id, score

– one complex type definition:• StudentType

– one simple type definition:• Score

• XML Schema elements identified by namespace http://www.w3.org/2001/XMLSchema● Namespace prefix ("xsd") is arbitrary but conventional

• Root element in XML Schema document is named schema● usually contains targetNamespace attribute

● defines namespace being defined by the schema● also declare this namespace with a prefix so that can refer to definitions within the schema

• Definitions create new types; declarations describe constituents of the instance document

• Definitions and declarations populate the target namespace

Page 9: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

9

Syntax for element and attribute declarations

• Element declaration has form<element name="name" type="type"/>

– associates simple or complex type, type, with the element named name

• Attribute declaration has form<attribute name="name" type="type"/>

– associates simple type, type, with an attribute named name

Page 10: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

10

Simple student instance documentCan avoid use of • Can avoid use of prefixes in attribute names

Page 11: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

11

Business card example

• Instance doc at top left in language defined at bottom left

• Assume we own the domain businesscard.org– so no-one else

uses this namespace

• Can fix it so that no need for prefix in uri attribute

• Compare DTD

Page 12: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

12

Connecting instance documents and schemas

• Instance document can refer to a schema using schemaLocation attribute from the namespace, http://www.w3.org/2001/XMLSchema-instance

• Value of schemaLocation attribute has two parts, separated by whitespace:– target namespace of schema– URI of schema document

• schemaLocation indicates that document is supposed to be valid with respect to the schema

• schemaLocation attributes may appear in any element– usually appear in root element– can also appear in another element to indicate that the schema applies to the

subtree under that element• means XML languages can be combined at will

• schemaLocation attribute value is actually sequence of "namespace URI" pairs– if more than one pair, all schemas apply independently

Page 13: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

13

More on schemaLocation• All attributes defined in

http://www.w3.org/2001/XMLSchema-instanceimplicitly declared for all elements in instance document

• schemaLocation attributes are optional– make instance documents self-describing

• Applications require documents to be valid relative to schemas decided by application developers, not schemas decided by document authors

• XMLSchema does not directly enforce a particular root element– e.g., an XMLSchema definition of XHTML cannot

express that the root element must be html– means that application must check root element as

well as carrying out XML validation

Page 14: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

14

Simple types• Simple type or datatype is set of Unicode

strings with a particular semantic interpretation– e.g., decimal datatype is built-in XML

Schema datatype which consists of all strings that represent decimal numbers (e.g., 3.1415)

• 3.1415 is equal to 3.141500• 42 is less than 117

• XML Schema contains some primitive simple types with pre-defined meanings

• XML Schema also provides various mechanisms for deriving new types from existing ones

Page 15: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

15

Simple Types (Datatypes) – Primitive

string any Unicode stringboolean true, false, 1, 0decimal 3.1415float 6.02214199E23double 42E970dateTime 2004-09-26T16:29:00-

05:00time 16:29:00-05:00date 2004-09-26hexBinary 48656c6c6f0abase64Binary SGVsbG8KanyURI http://www.brics.dk/ixwt/QName rcp:recipe, recipe...

Page 16: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

16

Some built-in derived simple types

• normalizedString– as string but whitespace facet is replace

• token– as string but whitespace facet is collapse

• language– "en", "da", "en-US", etc.

• NMTOKEN– e.g., "42", "my.form", "r103"

• NMTOKENS– e.g., "42 my.form r103"

• nonPositiveInteger– e.g., "-87", "0"

Page 17: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

17

A simple type element declaration

• <element name="serialnumber" type="nonNegativeInteger"/>

– assigns built-in primitive simple type, nonNegativeInteger, to elements named serialnumber

– contents of a serialnumber element must match nonNegativeInteger (possibly with surrounding whitespace)

– serialnumber element cannot contain child elements or attributes

Page 18: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

18

Deriving new simple types by restriction

• Restriction of a simple type defines a new type by restricting possible values of a base type– restriction performed on facets of base type (see

table above left)– restriction may contain multiple constraining facets

• Facet restrictions operate at semantic not syntactic level– e.g., <totalDigits value="3"/> allows 123, 0123 and

0123.0 but not 1234 and 123.05

Facet Constraining

length length of string or number of list items

minimum length

maximum length

pattern regular expression constraint

enumeration enumeration value

inclusive upper bound

inclusive lower bound

exclusive upper bound

exclusive lower bound

maximum number of digits

minLength

maxLength

whiteSpace controls whitespace normalization

maxInclusive

minInclusive

maxExclusive

minExclusive

totalDigits

Page 19: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

19

Deriving new simple types by restriction

• enumeration facet restricts values to a finite set of possibilities (see above left)

• pattern facet allows values to be constrained to satisfy regular expressions (see above right)– symbols that have a special meaning within

regular expressions can be escaped by prefixing with a backslash (e.g., \*)

• For most facets, restrictions may be changed in further derivations unless fixed="true" attribute is added to constraining facet

Page 20: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

20

Deriving simple types using list and union

• Use the list element inside a simpleType definition to define a whitespace separated string of values of a particular type (see above left)– e.g., "23 4 56 -7" is of type integerlist

• Use union element inside a simpleType definition to specify that a value must be one of two or more types– e.g., "true" and "1.3" are both of type

boolean_or_decimal

Page 21: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

21

Complex types

• An element declaration may assign a complex type to an element name:<element name="card" type="b:card_type"/>– means that elements with the name card must

satisfy all the requirements specified in the definition of the type card_type

– complex type definition may specify attributes, child element types and ordering and character data

• Complex type defined using XML Schema element, complexType– content of complexType element can be either

complex or simple

Page 22: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

22

Element reference

• Element reference takes the form<element ref="name" />

– name is the name of an element that has already been declared

• Note difference between element element with name attribute and one with a ref attribute!

Page 23: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

23

sequence element

• Concatenation within the content of an element with a complex content model is expressed using the sequence element

Page 24: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

24

choice element• Union (i.e., the '|'

operator in a regular expression) corresponds to the choice element

• At left, each card element contains either an email element or zero or 1 phone elements but not both

Page 25: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

25

all element

• A content sequence matches an all expression if each constituent of the expression is matched somewhere in the content model and every element in the content model is matched by a constituent in the expression

• Essentially variant of sequence in which order does not matter

Page 26: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

26

any element• any empty element is a

wildcard that matches any element

• Attribute namespace limits matching elements in various ways– whitespace separated list

of URIs– ##targetNamespace– ##local

• empty namespace– ##any– ##other

• any namespace except targetNamespace

Page 27: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

27

any element

• Can be used to specify that a different language is used inside an element– e.g., XHTML used inside the info element in

WidgetML (see above)– content must consist of one or more elements

from the XHTML namespace

Page 28: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

28

Some restrictions

• all element may only contain element references

• sequence and choice elements cannot contain all elements

• complexType contents cannot consist of single element or any declaration– need to wrap it in a sequence or choice

element

Page 29: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

29

Attribute references

• A complex type may optionally contain a number of attribute references of the form<attribute ref="name" />– name is the name of the attribute that has been

declared elsewhere– attribute reference must appear after the

content model description of a complex type– attribute reference can contain an attribute

named use which can take the values optional (default) or required

Page 30: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

30

minOccurs and maxOccurs

• minOccurs and maxOccurs attributes can be used with– element, sequence, choice, all and any

elements– define possible cardinalities of the element– values must be non-negative integers or, for maxOccurs, unbounded

– by default, minOccurs and maxOccurs are 1

Page 31: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

31

mixed attribute• complexType may optionally

have an attribute, mixed="true"– means arbitrary character

data is permitted anywhere in the content in addition to the elements declared in the content model

– Without mixed="true" attribute, only whitespace allowed between elements in content model

– Character data cannot be constrained if we also want to allow elements in the content