xml 6 6.1 xml. outline 1. xml 2. xsl / xslt 3. dtd 4. dom 5. xsd 6. xpath 7. xforms

53
XML 6 6.1 XML

Upload: cameron-darcey

Post on 14-Dec-2015

279 views

Category:

Documents


0 download

TRANSCRIPT

XML

6

6.1 XML

Outline

1. XML

2. XSL / XSLT

3. DTD

4. DOM

5. XSD

6. XPath

7. XForms

What is XML?

XML stands for EXtensible Markup Language

A meta-language for descriptive markup: you invent your own tags

XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive

Built-in internationalization via Unicode

Built-in error-handling

Optimized for network operations

Tons of support from the big IT companies

Some History

SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange Metalanguage for defining languages (through DTDs) A famous SGML language: HTML Separation of content and display Used in U.S. gvt. & contractors, large manufacturing

companies, technical info. Publishers,... SGML reference is 600 pages long

XML W3C recommendation in 1998 Simple subset (80/20 rule) of SGML: “ASCII of the Web”,

“Semantic Web” XML specification is 26 pages long

Timeline

1986 SGML becomes a standard

1989 Tim Berners-Lee creates the WWW

1994 W3C established

1998 XML 1.0 W3C Recommendation

Jan 2000 XHTML becomes W3C Recommendation A Reformulation of HTML 4 in XML 1.0

Oct 2000 W3c XML 1.0 (Second Edition) Recommendation http://www.w3.org/TR/REC-xml

Oct 2002 XML 1.1 Candidate Recommendation updates XML to use Unicode 3

XML and HTML

XML is not a replacement for HTML

XML was designed to carry data

XML and HTML were designed with different goals XML was designed to describe data and to focus on

what data is HTML was designed to display data and to focus on

how data looks.

HTML is about displaying information, while XML is about describing information

HTML and XML, I

HTML is used to mark up text so it can be displayed to users

XML is used to mark up data so it can be processed by computers

HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, <font>, <i>)

XML describes only content, or “meaning”

HTML uses a fixed, unchangeable set of tags

In XML, you make up your own tags

HTML and XML, II

HTML and XML look similar, because they are both SGML languages Both HTML and XML use elements enclosed in tags Both use tag attributes

More precisely, HTML is defined in SGML XML is a (very small) subset of SGML

HTML and XML, III

HTML is for humans HTML describes web pages You don’t want to see error messages about the web

pages you visit Browsers ignore and/or correct as many HTML

errors as they can, so HTML is often sloppy

XML is for computers XML describes data The rules are strict and errors are not allowed

In this way, XML is like a programming language Current versions of most browsers can display XML

XML does not DO anything

XML was not designed to DO anything

XML is created to structure, store and to send information

The following example is a book info, stored as XML:

<?xml version='1.0'?><bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> …</bookstore>

XML is Free and Extensible

XML tags are not predefined You must "invent" your own tags The tags used to mark up HTML documents and the

structure of HTML documents are predefined The author of HTML documents can only use tags

that are defined in the HTML standard

XML allows the author to define his own tags and his own document structure

XML Future

XML is going to be everywhere

XML is a cross-platform, software and hardware independent tool for transmitting information.

DocumentsConfiguration

Database

Application X

Repository

XML XML

XML XML

Benefits of XML

Open W3C standard

Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability

Strict rules Syntax Structure Case sensitive

How can XML be Used?

XML can Separate Data from HTML

With XML, your data is stored outside your HTML

XML is used to Exchange Data

With XML, data can be exchanged between incompatible systems

With XML, financial information can be exchanged over the Internet

XML can be used to Share Data

XML can be used to Store Data

XML can make your Data more Useful

XML can be used to Create new Languages

Components of an XML Document

Elements Each element has a beginning and ending tag

<TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />)

Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag

Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration

Components of an XML Document

Processing Instructions

Elements

Elements with Attributes

<?xml version=“1.0” ?>

<?xml-stylesheet type="text/xsl" href=“template.xsl"?>

<ROOT>

<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>

<ELEMENT2> </ELEMENT2>

<ELEMENT3 type=‘string’> </ELEMENT3>

<ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>

</ROOT>

XML declaration

The XML declaration looks like this:<?xml version="1.0" encoding="UTF-8" standalone="yes"?> The XML declaration is not required by browsers, but is

required by most XML processors (so include it!) If present, the XML declaration must be first--not even

whitespace should precede it Note that the brackets are <? and ?> version="1.0" is required (this is the only version so far) encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or

something else, or it can be omitted standalone tells whether there is a separate DTD

Processing Instructions

PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)

A PI is a command to the program processing the XML document to handle it in a certain way

XML documents are typically processed by more than one program

Programs that do not recognize a given PI should just ignore it

General format of a PI: <?target instructions?>

Example: <?xml-stylesheet type="text/css" href="mySheet.css"?>

XML Elements

An XML element is everything from the element's start tag to the element's end tag

XML Elements are extensible and they have relationships

XML Elements have simple naming rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml ..) Names cannot contain spaces

XML Attributes

XML elements can have attributes

Data can be stored in child elements or in attributes

Should you avoid using attributes? Here are some of the problems using attributes:

attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a Document Type

Definition (DTD) - which is used to define the legal elements of an XML document

An XML Document

<?xml version='1.0'?><bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book></bookstore>

Another XML Document

<?xml version="1.0"?><weatherReport> <date>7/14/97</date> <city>North Place</city>, <state>NX</state> <country>USA</country> High Temp: <high scale="F">103</high> Low Temp: <low scale="F">70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny &amp; hot</afternoon> Evening: <evening>Clear and Cooler</evening></weatherReport>

XML Validation

XML with correct syntax is Well Formed XML

XML validated against a DTD is Valid XML

Rules For Well-Formed XML

There must be one, and only one, root element

All XML elements must have a closing tag

Sub-elements must be properly nested A tag must end within the tag in which it was started

Attributes are optional Defined by an optional schema

Attribute values must be enclosed in “” or ‘’

Processing instructions are optional

XML is case-sensitive <tag> and <TAG> are not the same type of element

White space is preserved

CR / LF is converted to LF

Comment in XML is similar to that of HTML

XML DTD

A DTD defines the legal elements of an XML document defines the document structure with a list of legal

elements

XML Schema  XML Schema is an XML based alternative to DTD

Errors in XML documents will stop the XML program

XML Validators

Browsers Support for XML

Netscape 6 supports XML

Internet Explorer 5.0 supports the XML 1.0 standard

Internet Explorer 5.0 has the following XML support: Viewing of XML documents Full support for W3C DTD standards XML embedded in HTML as Data Islands Binding XML data to HTML elements Transforming and displaying XML with XSL Displaying XML with CSS Access to the XML DOM

Viewing XML Files

Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6 but to make it display like a web page, you have to

add some display information

XML documents do not carry information about how to display the data

Different solutions to the display problem, using CSS, XSL, JavaScript, and XML Data Islands

Will you be writing your future Homepages in XML?

Displaying XML with CSS

With CSS (Cascading Style Sheets) you can add display information to an XML document

Formatting XML with CSS is NOT the future of the Web

Formatting with XSL will be the new standard

Example: the xml file

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG>

<CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY>

<PRICE>10.90</PRICE> <YEAR>1985</YEAR>

</CD> <CD>

<TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST>

<COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY><PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD>

. . . . </CATALOG>

Example: the css file

CATALOG { background-color: #ffffff; width: 100%; }

CD { display: block; margin-bottom: 30pt; margin-left: 0; }

TITLE { color: #FF0000; font-size: 20pt; }

ARTIST{ color: #0000FF; font-size: 20pt; }

COUNTRY,PRICE,YEAR,COMPANY { Display: block; color: #000000; margin-left: 20pt; }

Displaying XML with XSL

With XSL you can add display information to your XML document

XSL is the preferred style sheet language of XML XSL (the eXtensible Stylesheet Language) is far

more sophisticated than CS One way to use XSL is to transform XML into HTML

before it is displayed by the browser

Example: the xml file

<?xml version="1.0" encoding="ISO-8859-1"?><?xml-stylesheet type="text/xsl" href="simple.xsl" ?><breakfast_menu>

<food><name>Belgian Waffles</name><price>$5.95</price><description>two of our famous Belgian Waffles with plenty of real maple syrup</description><calories>650</calories>

</food><food>

<name>Strawberry Belgian Waffles</name><price>$7.95</price><description>light Belgian waffles covered with strawberries and whipped cream</description><calories>900</calories>

</food>…</breakfast_menu>

</breakfast_menu>

Example: the xsl file

<?xml version="1.0" encoding="ISO-8859-1"?><html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> <body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-color:#EEEEEE"> <xsl:for-each select="breakfast_menu/food"> <div style="background-color:teal;color:white;padding:4px"> <span style="font-weight:bold;color:white"> <xsl:value-of select="name"/></span> - <xsl:value-of select="price"/> </div> <div style="margin-left:20px;margin-bottom:1em;font-size:10pt"> <xsl:value-of select="description"/> <span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving) </span> </div> </xsl:for-each> </body></html>

View the result in IE 6

XML Data Islands

XML can be embedded within HTML pages in Data Islands

Manipulated via client side script or data binding

The unofficial <xml> tag is used to embed XML data within HTML

Data Islands can be bound to HTML elements (like HTML tables)<html> <body> <xml id="cdcat" src="cd_catalog.xml"></xml> <table border="1" datasrc="#cdcat"> <tr> <td> <span datafld="ARTIST"> </span> </td> <td> <span datafld="TITLE"> </span> </td></tr> </table> </body> </html>

The Microsoft XML Parser

To read and update an XML document, you need an XML parser

The Microsoft XML parser comes with Microsoft Internet Explorer 5.0

Once you have installed IE 5.0, the parser is available to scripts, both inside HTML documents. The parser features a language-neutral programming model that

supports: JavaScript, VBScript, Perl, VB, Java, C++ and more W3C XML 1.0 and XML DOM DTD and validation

You can create an XML document object with the following code: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM")

Loading an XML file into the parser

XML files can be loaded into the parser using script code.

The following code loads an XML document (note.xml) into the XML parser: <script type="text/javascript">

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.load("note.xml") // ....... processing the document goes here </script>

The second line in the code above creates an instance of the Microsoft XML parser

The third line turns off asynchronized loading, to make sure that the parser will not continue execution before the document is fully loaded

The fourth line tells the parser to load the XML document called note.xml

Namespaces: Overview

Part of XML’s extensibility

Allow authors to differentiate between tags of the same name (using a prefix) Frees author to focus on the data and decide how to

best describe it Allows multiple XML documents from multiple

authors to be merged

Identified by a URI (Uniform Resource Identifier) When a URL is used, it does NOT have to represent

a live server

Namespaces: Declaration

xmlns: bk = “http://www.example.com/bookinfo/”

xmlns: bk = “urn:mybookstuff.org:bookinfo”

Namespace declaration examples:

Namespace declaration Prefix URI (URL)

xmlns: bk = “http://www.example.com/bookinfo/”

Namespaces: Examples

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”xmlns:money=“urn:finance:money”>

<bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE>

Namespaces: Default Namespace

An XML namespace declared without a prefix becomes the default namespace for all sub-elements

All elements without a prefix will belong to the default namespace:

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>

Namespaces: Scope

Unqualified elements belong to the inner-most default namespace. BOOK, TITLE, and AUTHOR belong to the default

book namespace PUBLISHER and NAME belong to the default

publisher namespace

<BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER></BOOK>

Namespaces: Attributes

Unqualified attributes do NOT belong to any namespace Even if there is a default namespace

This differs from elements, which belong to the default namespace

Entities

Entities provide a mechanism for textual substitution, e.g.

You can define your own entities

Parsed entities can contain text and markup

Unparsed entities can contain any data JPEG photos, GIF files, movies, etc.

Entity Substitution

&lt; <

&amp; &

CDATA

By default, all text inside an XML document is parsed

You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]>

Any characters, even & and <, can occur inside a CDATA

Whitespace inside a CDATA is (usually) preserved

The only real restriction is that the character sequence ]]> cannot occur inside a CDATA

CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text)

Pure XML -- Instance Model

XML 1.0 Standard: no explicit data model only syntax of well-formed and valid (wrt. a DTD) documents

implicit data model: nested containers ("boxes within boxes") labeled ordered trees (=a semistructured data model) relational, object-oriented, other data: easy to encode

<A> <B>foo</B> <C>bar</C> <C>lab</C></A>

A

B C

"foo" "bar"

C:"bar"

A:

B: "foo"

C:"lab"

"lab"

C

children are ordered

c2b2a2

c3b3a3

c1b1a1

CBA

R Rtuple

A a1 /AB b1 /BC c1 /C

/tupletuple

A a2 /AB b2 /BC c2 /C

/tuple …

/R

R

tuple

A B Ca1 b1 c1

tuple

A B Ca2 b2 c2

tuple

A B Ca3 b3 c3

Example: Relational Data to XML

Adding Structure and Semantics

XML Document Type Definitions (DTDs): define the structure of "allowed" documents (i.e., valid wrt. a DTD) database schema => improve query formulation, execution, ...

XML Schema defines structure and data types allows developers to build their own libraries of interchanged

data types

XML Namespaces identify your vocabulary

XML Related Technologies I

XHTML - Extensible HTML

CSS - Cascading Style Sheets

XSL - Extensible Style Sheet Language XSL consists of three parts: XML Document Transformation (renamed

XSLT, see below), a pattern matching syntax (renamed XPath, see below), and a formatting object interpretation. 

XSLT - XML Transformation XSLT is far more powerful than CSS. It can be used to transform XML

files into many different output formats.

XPath - XML Pattern Matching XPath is a language for addressing parts of an XML document. XPath

was designed to be used by both XSLT and XPointer.

XML Related Technologies II

XLink - XML Linking Language The XML Linking Language (XLink), allows elements to be inserted

into XML documents in order to create links between XML resources.

XPointer - XML Pointer Language The XML Pointer Language (XPointer), supports addressing into the

internal structures of XML documents, such as elements, attributes, and content.

DTD - Document Type Definition A DTD can be used to define the legal building blocks of an XML

document.

Namespaces XML namespaces defines a method for defining element and attribute

names used in XML by associating them with URI references.

XML Related Technologies III

DOM - Document Object Model The DOM defines interfaces, properties and methods to

manipulate XML documents.

XSD - XML Schema Schemas are powerful alternatives to DTDs. Schemas are

written in XML, and support namespaces and data types.

XQL - XML Query Language The XML Query Language supports query facilities to extract

data from XML documents.

SAX - Simple API for XML SAX is another interface to read and manipulate XML

documents

References

W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp

W3C XML page http://www.w3.org/XML/

XML Tutorials http://www.programmingtutorials.com/xml.aspx

Online resource for markup language technologies http://xml.coverpages.org/

Several Online Presentations

Reading List

W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp