packaging dataspiegel/en605481/archive/webdata.pdf · 2 3 distributed development on the world wide...

35
1 Packaging Data 605.481 Extensible Markup Language (XML)

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

1

Packaging Data

605.481

Extensible Markup Language

(XML)

Page 2: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

2

Distributed Development on the World Wide Web3

XML - Beyond HTML

• XML and HTML are both based on Standard Generalized Markup Language

• XML separates content from presentation• Content = data and language

• Over half the HTML markup dictates presentation

• XML does not specify either markup tags or grammar• HTML explicitly defines a set of legal tags

• HTML includes a set of rules for the arrangement and application of those tags (grammar)

• XML allows arbitrary definition and use of tags (i.e., eXtensible)

• XML is “well-formed” (more later)• Easy to parse, search or manipulate (unlike HTML)

• HTML is more of a “loose format” and most browsers are quite forgiving with regard to the format that does exist

Distributed Development on the World Wide Web4

XML Introduction

• Extensible Markup Language• Technology to make data “portable” across systems, databases, directory

services, business components, applications, companies

• Meta-language for defining other “languages”

• Does not specify markup tags (meaningful to a specific language parser)

• Does not specify a grammar (rule dictating the use of language tags)

• Tags and grammar are arbitrarily defined

• XML typically refers to XML and related technologies• DTD – Document Type Definitions

• XML Schema

• XSL/XSLT - Extensible Stylesheet Language, XSL Transforms

• Xpath – XML Path Language

• XHTML – XML Version of HTML

• XQL – XML Query Language

• XSP – XML Server Pages

• Others

Page 3: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

3

Distributed Development on the World Wide Web5

XML Applications

• Presentation• XML isolates the data content within a document so its presentation can

vary across device (computer, cell phone, pda, etc) or system (browser specific).

• Universal data content• Client specific formatting & transformations applied external to

document

• Communication• Non-proprietary “standard” for data-interchange• Easily searched and filtered

• Configuration• Used extensively in J2EE Architectures (as dictated by various

specifications (EJB, Servlet, etc)

• XML-RPC• Remote application component procedure calls across networks

• Business to Business (B2B) • Messaging exchange between companies (SOAP)• Electronic business orders (ebXML)• Financial Exchange (IFX)

Distributed Development on the World Wide Web6

A Simple XML Document

<?xml version="1.0"?>

<authors>

<name>

<firstname>Larry</firstname>

<lastname>Brown</lastname>

</name>

<name>

<firstname>Marty</firstname>

<lastname>Hall</lastname>

</name>

...

</authors>

Page 4: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

4

Distributed Development on the World Wide Web7

Header or Prolog

• XML documents always start with a header

• Gives an XML parser and XML applications information about how to handle the document• XML version

• Entity definitions

• DOCTYPE

• Header always starts with an XML declaration

<?xml version=“1.0” encoding=“ISO-8859-1” standalone=“no”?>

• XML version is required

• The encoding identifies the character set (default UTF-8)

• For more info on detection of character encoding:http://www.w3.org/TR/REC-xml/#sec-guessing

• The value standalone identifies if an external document is referenced for DTD or entity definition.

Distributed Development on the World Wide Web8

Processing Instructions

• Header typically contains other processing instructions (Pis)

<?target instruction?>

• Enclosed between <? ……. ?>

• target refers to called application to handle instruction

• <?xml …….?> is handled by the XML parser• <?xml-stylesheet ……. ?> is handled by the XSLT engine

• instruction are sets of keyword=value pairs to be passed to called application

<?xml-stylesheet href=“XSL/test.xsl” type=“text/xsl” media=“wap”>

Page 5: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

5

Distributed Development on the World Wide Web9

XML DocumentContent

• Document content must be nested within a rootelement<?xml version=“1.0” ?>

<book>

<title>Sample Title</title>

<contents>

<chapter number=“1”>Intro to HTML</chapter>

<chapter number=“2”>Elements in HTML</chapter>

:

</contents>

</book>

• XML elements • Tags and attributes

• Must be “well-formed”

• Entities• Character data (CDATA)

• Processing Instructions• Comments

Distributed Development on the World Wide Web10

XML Elements

• An XML element is any content contained between an opening and closing tag• An element can contain other child elements

<book>

<chapter>…</chapter>

</book>

• An element can contain simple textual information<chapter>Introduction and Overview</chapter>

• An element can contain mixed content (elements and text)<book> This is mixed content

<chapter>Introduction and Overview</chapter>

</book>

• An element can be empty<book></book>

<book />

Page 6: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

6

Distributed Development on the World Wide Web11

XML Tags and Attributes

• Tag names:• Case sensitive

• Start with a letter or underscore

• After first character, numbers, -, . are allowed

• Cannot contain white-space

• Avoid using colons (:)

• Attributes• Provide metadata for defined elements

• Same naming convention as tags

• Must be enclosed in quotes with no commas in between multiple attributes

<message to=“[email protected]” from=“[email protected]”>

Distributed Development on the World Wide Web12

XML Must be Well-Formed

• Every XML element must have both an opening and closing tag• <BOOK>…..</BOOK>

• <BOOK /> - empty tag is also legal

• Opening and closing tags must match in content and case• <BOOK>…</BOOK>, <book>…</book> is legal

• <Book>…..</BOOK> is NOT

• All tags must be properly nested• All elements must be closed before their “containing elements” are

closed

• All XML documents must have single tag pair to define a root element that contains all other content elements in the document

• Attribute values must always be quoted• White-space IS preserved in XML

Page 7: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

7

Distributed Development on the World Wide Web13

Document Entities

• Entities refer to a data item, typically text• General entity references start with & and end with ;

• The entity reference is replaced by it’s true value when parsed

• Special characters require entity references to avoid conflicts with the XML application (parser)• &lt; in place of <

• &gt; in place of >

• &amp; in place of &

• &quot; in place of "

• &apos; in place of '

• Entities are user definable (more later)

Distributed Development on the World Wide Web14

Other XML Content

• Unparsed Data (CDATA) • Character data markers contains data that should be passed to calling

application without any XML parsing• General form: <![CDATA[unparsed content here]]>

<?xml version="1.0" encoding="ISO-8859-1"?>

<server>

<port status="accept">

<![CDATA[8001 <= port < 9000]]>

</port>

</server>

• XML Comments <!-- This is both an XML and HTML comment -->

• Can cross multiple lines

Page 8: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

8

Distributed Development on the World Wide Web15

XML NamespacesMotivation

• Since XML element names are not predefined, name conflicts can arise when distinct documents use common names for elements.

<table>

<tr>

<td>Apples</td>

<td>Bananas</td>

</tr>

</table>

<table>

<name>

Coffee Table

</name>

<width>80</width>

<length>120</length>

</table>

• Distinguish the types by identifying elements & attributes with a unique predefined container<h:table>

<h:tr>

<h:td>Apples</h:td>

<h:td>Bananas</h:td>

</h:tr>

</h:table>

<f:table>

<f:name>

Coffee Table

</f:name>

<f:width>80</f:width>

<f:length>120</f:length>

</f:table>

Distributed Development on the World Wide Web16

XML Namespaces

• Purpose• Provide a context for element and attribute names in XML instances

• Prevent “collisions” of like named elements and attributes in XML instances

• Naming• Namespaces are uniquely identified by URIs, most often HTTP URL

addresses

• Need not be deferenceable – only serves as a unique identifier

• Case sensitive• Namespaces are typically associated with an arbitrary “prefix”

• Ensure readability of XML document• Ensure document uses permitted XML naming guidelines

• Though arbitrary, some “standard” prefixes are recommended • xsi or xs => XML schema namespace• xsd => XML schema namespace

• xsl => Extensible Stylesheet Language namespace

Page 9: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

9

Distributed Development on the World Wide Web17

Declaring XML Namespaces

• Use xmlns attribute to associate a prefix with a namespace in any element• xmlns:namespace-prefix=“namespaceURI”

• Namespace is valid for all “prefixed” elements/attributes referenced in element where it is declared and ALL of its children.

• Typical practice is to declare all namespaces in the document root tag

• Namespace is NOT valid prior to declaration • Namespace is NOT valid after the terminating element

associated with tag where declared• Cannot associate a namespace prefix with the empty

string!• Can declare multiple namespaces in any XML element

<fdb:table xmlns:fdb=“http://www.store.com/data”>

Distributed Development on the World Wide Web18

Referencing Namespaces

• Reference elements and attributes in the namespace by using the “QName” (Qualified Name)• QName is the name of the element/attribute preceeded by the namespace

prefix and separated with a colon.

<fdb:items xmlns:fdb=“http://www.store.com/data”><table><tr><td><fdb:table>Coffee Table</fdb:table>

</td></tr><tr><td><fdb:table>End Table</fdb:table>

</td></tr></table>

</fdb:items>

• Note that start and end tag names must be identical so use QNames in each

Page 10: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

10

Distributed Development on the World Wide Web19

The “Default” Namespace

• Assigns UNPREFIXED elements to a defined namespace through the use of the xmlns attribute• xmlns=“namespaceURI”

• CAN be assigned to the empty string (xmlns=“ “) which means unprefixed elements are not associated with a namespace

• Note that unprefixed attributes are NEVER in any namespace

<orderData xmlns=“http://www.catalog.com/customers”

xmlns:ord=“http://www.catalog.com/orders”

xmlns:prod=“http://www.catalog.com/products”>

<ord:id>0564</ord:id>

<id>10111962</id>

<name>Spiegel</name>

<item>

<prod:id prod:type=“book”>1424</prod:id>

</item>

</orderData>

Distributed Development on the World Wide Web20

Validating XML Documents

• Beyond basic “well-formed” rules of XML

• Provide a set of rules to validate an XML document• The structure of elements and attributes

• The order of elements

• The data values of attributes and elements

• The uniqueness of values in an instance

• Provide a set of rules for consistent understanding and processing of XML documents between organizations (i.e., business partners)

• There are several “languages” for defining rules for validating XML• Document Type Definitions (DTDs)

• W3C XML Schema

• RELAX NG

• Schematron

Page 11: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

11

Distributed Development on the World Wide Web21

W3C XML Schema Language

• XML based document (must be well-formed)

• Provided extensive ability to type data in attributes and elements• integer

• date

• String

• etc…..

• Can define types as well

• Support for object oriented techniques• Can create new types that extend definitions of existing types

• Can create new types that restrict definitions of existing types

• Support for modular compositions through the use of namespaces• Schema include and import statements develop schema from modular

documents that cross many namespaces

Schema Vocabulary

• Schema document• XML document that contains the rules that are used to validate other XML

documents

• Schema instance• XML document with content that adheres to the rules defined in a schema

• Schema Components• Declarations are components that can appear in an instance and be validated

• Definitions components that are internal to the schema

• Declarations/Definitions can have names or be anonymous

• Anonymous definitions/declarations have no names and are contained within other definitions/declarations

• Names definitions/declarations provide reusability within schema• Global versus Local Declarations/Definitions

• Global definitions/declarations are at the “top level” of schema and have component unique names within schema

• Local definitions/declarations are scoped to the definition/declaration that contains them and may be anonymous

Distributed Development on the World Wide Web22

Page 12: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

12

Distributed Development on the World Wide Web23

Defining an XML Schema Document

• Schema document is an XML document with the root element “schema”

• Should reference the XML Schema namespace for its element/attribute content and type definitions• http://www.w3.org/2001/XMLSchema

• Schema document describes components for at most one namespace – its “target namespace”• By default namespace applies to all global declarations (top level definitions)

• Can apply to local declarations with elementFormDefault and attributeFormDefaultattributes of schema element

• elementFormDefault=“qualified” (default is unqualified)• attributeFormDefault=“qualified” (default is unqualified)

<?xml version=“1.0”?>

<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”

xmlns:ref=“http://www.cat.com/orders”

targetNamespace=“http://www.cat.com/orders”>

:

<!-- content goes in here -->

:</xs:schema>

Distributed Development on the World Wide Web24

Defining a Simple Element

• Simple elements contain only text, no other elements or attributes• Text can be one of several (~44) predefined simple types

<xs:element name=“xxx” type=“yyy”/>

<xs:element name=“xxx” type=“yyy” default=“zzz”/>

<xs:element name=“xxx” type=“yyy” fixed=“zzz”/>

• Some common simple types (roughly 44 defined in W3C Schema)

• Attributes• default = value to give element if no value specified or element not in document

• fixed = elements value that cannot be changed (if in document, value must match)

• Cardinality through minOccurs and maxOccurs attributes• may be any integer greater than zero

• default for both is 1

• “unbounded”

<xs:element name=“xx” type=“yy” minOccurs=“0” maxOccurs=“1” />

• xs:string • xs:integer • xs:date

• xs:decimal • xs:boolean • xs:time

Page 13: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

13

Distributed Development on the World Wide Web25

Attributes

<xs:attribute name=“attributeName” type=“typeName”/>

• Attributes are ALWAYS simple

• Options• use attribute

• required• optional• fixed

• default attribute indicates to use a value in the event one is not specified

• value=“value” assigns a value to attribute when use=“fixed” or “default”

Defining Simple Types

• Derive additional simple types from the existing built in or other derived simple types.

• Specifies constraints and info about values of attributes or text-only elements

• Use restriction element of W3C Schema with one or more of 12 available facets defined in W3C Schema<xs:simpleType name=“AFCEastType">

<xs:restriction base="xs:string">

<xs:enumeration value=“Ravens"/>

<xs:enumeration value=“Browns"/>

<xs:enumeration value=“Steelers"/>

<xs:enumeration value=“Bengals"/>

</xs:restriction>

</xs:simpleType>

<xs:element name=“MyTeam” type=“AFCEastType”/>Distributed Development on the World Wide Web26

Page 14: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

14

W3C Schema Facets

• enumeration – limit to a list of acceptable values• fractionDigits – maximum number of decimal places• length – exact number of characters allowed

• maxExclusive – upper bounds for numeric values (<)• maxInclusive – upper bounds for numeric values (<=)

• maxLength – maximum number of characters allowed• minExclusive – lower bound for numeric values (>)

• minInclusive – lower bound for numeric values (>=)• minLength – minimum number of characters allowed• pattern – dictates acceptable sequence of characters

• totalDigits – exact number of digits allowed• whiteSpace – how white space is handled

Distributed Development on the World Wide Web27

Complex Types

• Content• Simple (character based)

• elements

• mixed: character content and elements

• Empty

• Can have attributes as well

• Order and structure of complex types dictated by a content model based on W3C Schema model group declarations (compositors)• xs:all

• xs:choice

• xs:sequence

Distributed Development on the World Wide Web28

Page 15: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

15

Distributed Development on the World Wide Web29

Model Group Declarations

• xs:all requires that each element in the group MUST occur at most once (exactly once by default) and order is not important

<xs:all>

<xs:element name=“GIVEN” type=“xs:string”/>

<xs:element name=“FAMILY” type=“xs:string”/>

</xs:all>

• minOccurs of elements restricted to “0” and “1”, default is “1”• maxOccurs must be “1” (optional)

• xs:choice specifies that ANY ONE element from the group must appear, or between N and M elements from the group should appear in any order

<xs:choice>

<xs:element name=“COMPOSER” type=“personType”/>

<xs:element name=“PRODUCER” type=“personType”/>

</xs:choice>

• minOccurs and maxOccurs in choice tag indicates a selection of that many elements from the choice group in any order

Distributed Development on the World Wide Web30

Model Group Declarations(cont)

• xs:sequence requires that each element appears in the order specified in the schema definition

<xs:sequence>

<xs:element name=“GIVEN” type=“xs:string”/>

<xs:element name=“MIDDLE” type=“xs:string”/>

<xs:element name=“FAMILY” type=“xs:string”/>

</xs:sequence>

• minOccurs and maxOccurs of elements can dictate number of appearances they make in group

• minOccurs and maxOccurs applied to sequence tag applies to repetition of entire group

• sequence and choice compositors can be nested within other compositors

Page 16: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

16

Distributed Development on the World Wide Web31

Defining a Complex Type

• A complex element is a grouped set of other elements (simple andcomplex)

<xs:complexType name=“typeName” mixed=“true”><xs:sequence> <!-- or other grouping element -->

<xs:element name=“element1" type=“elementType"/>

<xs:element name=“element2" type=“elementType"/>

</xs:sequence>

</xs:complexType>

<xs:element name=“element-name” type=“typeName”/>

<element-name>

mixed content allows text

<element1>…</element1>outside of

<element2>…</element2>defined

elements

</element-name>

• If mixed attribute removed (or set to false) then only elements are allowed

Other Complex Types

• Character only elements<xs:complexType name=“empType"><xs:simpleContent><xs:extension base="xs:string"><xs:attribute name=“id" type=“xs:integer"/>

</xs:extension></xs:simpleContent>

</xs:complexType>

<xs:element name=“employee" type=“empType"/>

<employee id=“2343”>John Doe</employee>

• Empty elements<xs:complexType name=“empType"><xs:attribute name=“empID" type="xs:positiveInteger"/>

</xs:complexType>

<xs:element name=“employee" type=“empType"/>

<employee empID=“2346”/>Distributed Development on the World Wide Web32

Page 17: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

17

Distributed Development on the World Wide Web33

Example Schema

<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

xmlns:mus="http://www.music.com/labels"targetNamespace="http://www.music.com/labels">

<xs:element name="cd" type="mus:cdType"/>

<xs:complexType name="cdType"><xs:sequence><xs:element name="title" type="xs:string"/><xs:element name="author"><xs:complexType>

<xs:simpleContent><xs:extension base="xs:string">

<xs:attribute name="source" type="xs:string" use="optional" default="band" />

</xs:extension></xs:simpleContent>

</xs:complexType></xs:element><xs:element name="track" type="mus:trackType" maxOccurs="unbounded“/>

</xs:sequence></xs:complexType>

<xs:complexType name="trackType"><xs:sequence>

<xs:element name="songTitle" type="xs:string"/><xs:element name="length" type="xs:float"/>

</xs:sequence><xs:attribute name="number" type="xs:integer" use="required"/>

</xs:complexType></xs:schema>

Distributed Development on the World Wide Web34

Schema in XML Documents

• An XML document that conforms to a schema is an instance

• For validation against a schema, the document should provide “guidance” to where schema document(s) are using the schemaLocation attribute which is in the XMLSchema-instance namespace• Attribute pairs schema namespaces (from the targetNamespace) to

schema documents

• xsi:schemaLocation=“http://www.catalog.com/orders ord.xsd”• Can pair multiple namespaces and documents in single

attribute

• Attribute provides guidance, though it may not be employed by specific XML validators/processors.

• Elements/Attributes defined in a schema must be referenced in namespace that is the targetNamespace of defining schema document.

Page 18: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

18

Distributed Development on the World Wide Web35

Schema and Instances

• Schema template<?xml version=“1.0” ?><xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”

targetNamespace=“http://www.books.com/data”>:<!-- content goes in here -->

::

</xs:schema>

• Access from XML instance<?xml version=“1.0” ?><!-- book is globally defined in schema at note.xsd --><bk:book xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”

xsi:schemaLocation=“http://www.books.com/data note.xsd”xmlns:bk=“http://www.books.com/data”>:<!-- content goes in here -->

::

</bk:book>

Distributed Development on the World Wide Web36

Example XML

<mus:cd xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:mus="http://www.music.com/labels"

xsi:schemaLocation="http://www.music.com/labels music.xsd">

<title>Live at Raji’s</title>

<author source="person">The Dream Syndicate</author>

<track number="1">

<songTitle>Halloween</songTitle>

<length>3.34</length>

</track>

<track number="4">

<songTitle>Days of Wine and Roses</songTitle>

<length>5.25</length>

</track>

</mus:cd>

Page 19: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

19

Distributed Development on the World Wide Web37

Validating Instances Against Schema

• Validation tools on the World Wide Web• DecisionSoft - http://tools.decisionsoft.com/schemaValidate

• Load schema document and instance • Will return results through browser

• Other online tools at OASIS Cover Pages web site

• http://xml.coverpages.org/check-xml.html

• Local IDE and editor validation• Eclipse Web Tools project at http://www.eclipse.org

• Oxygen XML editor at http://www.oxygenxml.com

• Others

Distributed Development on the World Wide Web38

Other “Languages” for XML Validation

• Document Type Definitions (DTDs)• Simple and compact

• Becoming “obsoleted” because of limitations compared to other technologies

• Documents are NOT xml based and require additional non-xml based tools for processing

• Limited general data typing• Difficult to support XML namespaces

• RELAX NG• Currently under development by OASIS

• Validation only, not for instance processing

• No capability for inheritance

• Schematron• Rule based, not grammar based like others

• Validation only and “anything is valid unless specifically prohibited”

• Based on XPath – easy to learn and use

Page 20: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

20

Distributed Development on the World Wide Web39

CSS Motivation

• Powerful and flexible way to specify the formatting of elements

• Separation of presentation from content (first step toward XML and XML style sheets)

• Share style sheets across multiple documents or entire web site

• Can specify a definitions for a style that apply to specific elements or classes of elements

HTML/CSS Lecture Notes

• XHTML document template• XHTML element structure• Difference between inline and block

level HTML tags• CSS rule syntax in external style sheets• Assigning rules to HTML markup tags• Linking external style sheets to

HTML/XHTML documents• Basic CSS properties

Distributed Development on the World Wide Web40

Page 21: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

21

Distributed Development on the World Wide Web41

Transforming XML

• XML should be a “pure data layer”

• Desire to transform original XML for numerous reasons:• Presentation

• Add formatting to document based on specific device, software application, etc.

• May require numerous presentations for a single XML document depending upon client

• Communication

• Transform XML data from a structure used by source component into another understood by destination component

• Methods• Application of CSS to XML

• Extensible Style Sheet Transformations (XSLT)

Distributed Development on the World Wide Web42

Example – vendor.xml

• Original XML data document

<?xml version=“1.0”><vendor>

<vendor_name>Crazy Marge’s Bed Emporium</vendor_name><advertisement><ad_sentence>

We never have a sale because we have the lowest prices in town.</ad_sentence>

</advertisement><product><product_name>SleepEazy Mattresses</product_name><item>

<price>$300</price><product_desc>per set, any size</product_desc></item>

<product>:

</vendor>

Page 22: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

22

Distributed Development on the World Wide Web43

Vendor.xml in Browser

Distributed Development on the World Wide Web44

Example vendor.css

vendor { border-width:5px;border-style:ridge;background-color: #FFFFFF;width:600px;text-align:center;padding:25px;

}

vendor_name {font-weight:bold;font-size:x-large;

}:product_name {

display:inline;font-weight:bold;

}

special {display:block;margin: 20px 0px;text-align:center;:border-style: outset;border-width: 5px;

}

Page 23: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

23

Distributed Development on the World Wide Web45

Updated vendor.xml

<?xml version=“1.0”>

<?xml-stylesheet type=“text/css” href=“vendor.css”?><vendor>

<vendor_name>Crazy Marge’s Bed Emporium</vendor_name><advertisement>:

</vendor>

Distributed Development on the World Wide Web46

XSL Transformations (XSLT)

• Application of an XSL Style Sheet instead of CSS<?xml version=“1.0”?>

<xsl:stylesheet

xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”

version=“1.0”>

:

</xsl:stylesheet>

• Associate Style Sheet with XML Document<?xml version=“1.0”?>

<?xml-stylesheet href=“stylesheet.xsl” type=“text/xsl”?>

<root>

:

Page 24: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

24

Distributed Development on the World Wide Web47

XSL Style Sheets

• A series of templates to handle varying parts of the XML document

<xsl:template match=“/”>

<xsl:for-each select=“//vendor”>

<xsl:sort select=“vendor_name”/>

<h2>

<xsl:value-of select=“vendor_name”/>

</h2>

</xsl-for-each>

<xsl:apply-templates/>

</xsl:template>

• Templates use Xpath expressions and formatting• HTML tags

• loops

• conditionals

JavaScript Object Notation(JSON)

Distributed Development on the World Wide Web48

Page 25: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

25

Overview

• Syntax for storing and exchanging text information

• Self-describing and “easy to understand”• Based on JavaScript syntax

• Standard ECMA-262 3rd Edition - December 1999

• Format is that same used for creating JavaScript objects

• Completely language independent

• Easily integrated into and parsed using JavaScript• Can be transported using AJAX• APIs exist for numerous other languages (C/C++,

Java, ASP, Python, PHP, Ruby…..)

Distributed Development on the World Wide Web49

An Alternative to XML

• JSON is a native web format supported by most browser Javascript engines, no need for proprietary xml parsing engines

• Easier to read and write• More about the data than the “markup”

• Edited with plain text editors

• Smaller document size• Simpler

• No CDATA

• No comments

• Only a few data types

• More flexible as there are no schemas by default

Distributed Development on the World Wide Web50

Page 26: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

26

JSON Syntax

• Data is in name : value pairs (separated by a colon)• Names are enclosed in double quotes

• Values can be:

• Integer or floating point numbers• Strings (enclosed in double quotes)• Boolean (true or false)• Arrays (enclosed in square brackets)• Objects (enclosed in curly brackets)• null

• Data is separated by commas• Curly brackets holds objects• Square brackets holds arrays

Distributed Development on the World Wide Web51

Data Examples

• Name : Value pair• “firstname” : “Richard”

• “age” : 30

• Objects• { “firstname” : “Richard”, “lastname” : “Spiegel”, “age” : 30 }

• { “id” : 88421, “Experience” : 17, “Reviews” : [8, 5, 7, 4] }

• Arrays[

{ “firstname” : “Richard”, “lastname” : “Spiegel”, “age” : 30 },

{ “firstname” : “Brian”, “lastname” : “Spiegel”, “age” : 15 },

{ “firstname” : “Robert”, “lastname” : “Evans”, “age” : 40 }

]

Distributed Development on the World Wide Web52

Page 27: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

27

Example From JavaScript

<html><head>

<title>JavaScript JSON Example</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><script type="text/javascript">var people = [{"firstname":"Rich", "lastname" : "Spiegel", "age" : 30},

{"firstname":"Brian", "lastname" : "Spiegel", "age" : 15},{"firstname":"Robert", "lastname" : "Evans", "age" : 40},{"firstname":"Erika", "lastname" : "Smith", "age" : 28}];

</script></head><body><table id="rTable" border="1">

<thead><tr><td>First Name</td><td>Last Name</td><td>Age</td>

</tr></thead></table><script type="text/javascript">

var table = document.getElementById('rTable');for (var i = 1; i < people.length+1; i++) {

var row = table.insertRow(i);row.insertCell(0).innerHTML = people[i-1].firstname;row.insertCell(1).innerHTML = people[i-1].lastname; row.insertCell(2).innerHTML = people[i-1].age;

} </script>

</body></html>

28SmithErika

40EvansRobert

15SpiegelBrian

30SpiegelRich

AgeLast NameFirst Name

28SmithErika

40EvansRobert

15SpiegelBrian

30SpiegelRich

AgeLast NameFirst Name

Distributed Development on the World Wide Web53

Example from Java

Distributed Development on the World Wide Web54

:

import com.fasterxml.jackson.annotation.JsonProperty;

import com.fasterxml.jackson.core.JsonGenerationException;

import com.fasterxml.jackson.databind.ObjectMapper;

import com.fasterxml.jackson.databind.JsonMappingException;

// Uses Jackson 2.0.2 libraries: core, databind and annotations

public class JacksonClassToJSON {

public static void main(String[] args) {

User user = new User();

try {

// convert user object to json string, and save to a file

mapper.writeValue(new File(System.getProperty("user.dir") + "/user.json"), user);

// display to console

System.out.println(mapper.writeValueAsString(user));

}

catch (JsonGenerationException e) { e.printStackTrace(); } catch (JsonMappingException e) { e.printStackTrace(); }

catch (IOException e) { e.printStackTrace(); }

}

}

public class User {

@JsonProperty

private int age = 29;

@JsonProperty

private String name = “rich”

}

User.json

{“age”:29,”name:”Rich”}

Page 28: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

28

Example from PHP5

<?php

$rfsArray = array( “name" => “Rich“, “age” => 30, “zipcode” => 20723 );

$jsonArray = json_encode($rfsArray);

echo $jsonArray;

?>

OUTPUT IS {“name”:”Rich”, “age”:30, “zipcode”:20723 }

<?php

$json_string='{"id":1,"name":“rich","country":"usa","office":[“apl",“epp"]}’;

$obj=json_decode($json_string);

echo $obj->name; //displays rich

echo $obj->office[0]; //displays apl

foreach($obj->office as $val)

echo $val; // displays each office as it loops

?>

Distributed Development on the World Wide Web55

JSON Validation

• Really? – One motivation for JSON was the flexibility of having no default

schemas

– Many of the opinion that the lack of schemas is a “feature” of JSON

• Why?– Data shared across services may have specific characteristics or

restrictions that need to be shared

– Storing application configuration information that needs to maintain compatibility with future releases

– Management of data stored in document-oriented databases

– Desire to validate data content in automated test suites

• Options– IETF has draft (version 3) of a JSON Schema Specification, but its over

a year old (http://tools.ietf.org/html/draft-zyp-json-schema-03)

– Alternative proposal based on RELAX-NG (http://giftfile.org/depot/home/acarrico/json/json-rng.txt)

– Regular JSON (JSONR) proposed at (http://laurentszyster.be/jsonr/)Distributed Development on the World Wide Web56

Page 29: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

29

57

Questions?

Distributed Development on the World Wide Web58

Further Reading - XML

• Java and XML, McLaughlin• XML and Java from Scratch, Chase• Definitive XML Schema, Priscilla Walmsley, Prentice Hall PTR• Core Web Programming, Hall and Brown, Chap 23• Links

• O’Reilly XML Resource Center • http://www.xml.com

• XML.org• http://www.xml.org

• Chapters 17 and 24 from the XML Bible• http://www.ibiblio.org/xml/books/bible2/chapters/

• Tutorials at W3 Schools• http://www.w3schools.com/default.asp

• TheScarms XML Tutorials• http://www.thescarms.com/XML/XMLTutorial.asp• http://www.thescarms.com/XML/SchemaTutorial.asp

• Systinet Tutorial• http://www.zvon.org/xxl/XSLTutorial/Books/Book1/index.html

• DecisionSoft• http://tools.decisionsoft.com/schemaValidate

Page 30: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

30

Distributed Development on the World Wide Web59

Further Reading – XHTML/CSS

• Dynamic HTML, Goodman

• Core Web Programming, Hall, Brown• Chapters 1 through 5

• HTML Tag Links• http://www.w3schools.com/html/html_reference.asp

• http://www.cosy.sbg.ac.at/~lendl/tags.html

• http://www.davesite.com/webstation/html/taglist.shtml

• http://werbach.com/barebones/barebones.html

• http://web3.apl.jhu.edu/605.481/notes/XHTML_2006.pdf

• CSS Property Reference Links• http://www.htmldog.com/reference/cssproperties/

• http://www.htmlhelp.com/reference/css/properties.html

• http://www.w3schools.com/css/css_reference.asp

• http://web3.apl.jhu.edu/605.481/notes/CSS_2006.pdf

Further Reading - JSON

• Overview and links to RFCs and language specific APIs• http://json.org

• Tutorial and examples• http://www.w3schools.com/json

• PHP JSON Manual: http://php.net/manual/en/book.json.php

• Various JSON Tutorials: http://www.roseindia.net/tutorials/json/

• http://www.w3resource.com/JSON/introduction.php

• JSON APIs for Java• json-simple: http://code.google.com/p/json-simple/

• Jackson JSON Processor: http://jackson.codehaus.org/

• google-gson: http://code.google.com/p/google-gson/

Page 31: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

31

61

Optional ReadingDTDs

Distributed Development on the World Wide Web62

DOCTYPE

• Specifies the location of the legacy Document Type Definition (DTD) which defines the syntax and structure of the elements in the document

• Common forms• <!DOCTYPE root-element [DTD Statements]>

• <!DOCTYPE root-element SYSTEM URL>

• <!DOCTYPE root-element PUBLIC FPI URL>

• root identifies the root element of the document• If external to the XML document, it will be referenced

by a SYSTEM or PUBLIC URL• SYSTEM URL refers to a private DTD on the local file system or HTTP

server

• PUBLIC URL refers to a DTD intended for public use

Page 32: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

32

Distributed Development on the World Wide Web63

DOCTYPE (cont)

• Formal Public Identifier (FPI) has four parts• Connection of DTD to a formal standard

• Dash (-) if defining yourself• Plus (+) if some non-standards body has approved the DTD• ISO if DTD approved by a formal standards committee

• Group responsible for the DTD

• Description and type of the document

• Language used in DTD

• Examples<!DOCTYPE book “DTDs/CWP.dtd”>

<!DOCTYPE book SYSTEM “http://www.corewebprogramming.com/DTDs/CWP.dtd”>

<!DOCTYPE Book PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”

“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>

<!DOCTYPE CWP PUBLIC “-//Prentice Hall//DTD Core Series 1.0//EN”

“http://www.prenticehall.com/DTD/Core.dtd”>

Distributed Development on the World Wide Web64

DTDs - Defining Elements

• <!ELEMENT name definition/type>

<!ELEMENT daylily (cultivar, award*, bloom, cost)+>

<!ELEMENT cultivar (#PCDATA)>

<!ELEMENT id (#PCDATA | catalog_id)>

• Types• ANY Any well-formed XML data

• EMPTY Element cannot contain any text or child elements

• PCDATA Character data only (should not contain markup)

• elements List of legal child elements (no character data)

• mixed May contain character data and/or child elements(cannot constrain order and number of child elements)

Page 33: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

33

Distributed Development on the World Wide Web65

DTDs - Defining Elements, cont.

• Cardinality• [none] Default (one and only one instance)

• ? 0, 1

• * 0, 1, …, N

• + 1, 2, …, N

• List Operators• , Sequence (in order)

• | Choice (one of several)

Distributed Development on the World Wide Web66

DTDs - Grouping Elements

• Set of elements can be grouped within parentheses• (Elem1?, Elem2?)+

• Elem1 can occur 0 or 1 times followed by 0 or 1 occurrences of Elem2

• The group (sequence) must occur 1 or more times

• OR• ((Elem1, Elem2) | Elem3)*

• Either the group of Elem1, Elem2 is present (in order) or Elem3 is present, 0 or more times

Page 34: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

34

Distributed Development on the World Wide Web67

DTD - Element Example

<?xml version="1.0" standalone="yes"?>

<!DOCTYPE Person [

<!ELEMENT Person ( (Mr|Ms|Miss)?, FirstName,

MiddleName*, LastName, (Jr|Sr)? )>

<!ELEMENT FirstName (#PCDATA)>

<!ELEMENT MiddleName (#PCDATA)>

<!ELEMENT LastName (#PCDATA)>

<!ELEMENT Mr EMPTY>

<!ELEMENT Ms EMPTY>

...

<!ELEMENT Sr EMPTY>

]>

<Person>

<Mr/>

<FirstName>Lawrence</FirstName>

<LastName>Brown</LastName>

</Person>

Distributed Development on the World Wide Web68

DTDs - Defining Attributes

• <!ATTLIST element attrName type modifier>• Modifiers

• #IMPLIED – attribute can remain unspecified• #REQUIRED – attribute must be present or document is invalid• #FIXED – value is set and can never be changed• “value” – default value applied to attribute is left unspecified, applies to

enumeration• Types

• CDATA – unparsed data• Enumeration

• Attribute (value1|value2|value3) [Modifier]

• ID, IDREF, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION

• Examples<!ELEMENT Customer (#PCDATA )><!ATTLIST Customer id CDATA #IMPLIED>

<!ELEMENT Product (#PCDATA )><!ATTLIST Product

cost CDATA #FIXED "200"id CDATA #REQUIRED>

Page 35: Packaging Dataspiegel/en605481/archive/WebData.pdf · 2 3 Distributed Development on the World Wide Web XML - Beyond HTML • XML and HTML are both based on Standard Generalized Markup

35

Distributed Development on the World Wide Web69

DTDs - Defining Entities

<!ENTITY name “replacement text”>

Examples

<!ENTITY amp “&”><!ENTITY copyright “2001, Prentice Hall”>

Inside xml document

<book>

<title>Core Web Prog.,&copyright;</title>

</book>