xml for e-commerce ii helena ahonen-myka. xml processing model n xml processor is used to read xml...

44
XML for E-commerce II Helena Ahonen-Myka

Upload: rodger-randall-gibbs

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML for E-commerce II

Helena Ahonen-Myka

Page 2: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML processing model

XML processor is used to read XML documents and provide access to their content and structure

XML processor works for some application

the specification defines which information the processor should provide to the application

Page 3: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Parsing

input: an XML document basic task: is the document well-

formed? Validating parsers additionally: is the

document valid?

Page 4: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Parsing

parsers produce data structures, which other tools and applications can use

two kind of APIs: tree-based and event-based

Page 5: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Tree-based API

compiles an XML document into an internal tree structure

allows an application to navigate the tree

Document Object Model (DOM) is a tree-based API for XML and HTML documents

Page 6: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Event-based API

reports parsing events (such as start and end of elements) directly to the application through callbacks

the application implements handlers to deal with the different events

Simple API for XML (SAX)

Page 7: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Example<?xml version=”1.0”> <doc> <para>Hello, world!</para> </doc>

Events:

start documentstart element: docstart element: paracharacters: Hello, world!end element: paraend element: doc

Page 8: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Example (cont.)

an application handles these events just as it would handle events from a graphical user interface (mouse clicks, etc) as the events occur

no need to cache the entire document in memory or secondary storage

Page 9: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Tree-based vs. event-based

tree-based APIs are useful for a wide range of applications, but they may need a lot of resources (if the document is large)

some applications may need to build their own tree structures, and it is very inefficient to build a parse tree only to map it to another tree

Page 10: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Tree-based vs. event-based

an event-based API is simpler, lower-level access to an XML document

as document is processed sequentially, one can parse documents much larger than the available system memory

own data structures can be constructed using own callback event handlers

Page 11: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

We need a parser...

Apache Xerces: http://xml.apache.org IBM XML4J: http://alphaworks.ibm.com XP: http://www.jclark.com/xml/xp … many others

Page 12: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

… and the SAX classes

http://www.megginson.com/SAX/ often the SAX classes come bundled to

the parser distribution some parsers only support SAX 1.0, the

latest version is 2.0

Page 13: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Starting a SAX parser

import org.xml.sax.XMLReader;

import org.apache.xerces.parsers.SAXParser;

XMLReader parser = new SAXParser();

parser.parse(uri);

Page 14: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Content handlers

In order to let the application do something useful with XML data as it is being parsed, we must register handlers with the SAX parser

handler is a set of callbacks: application code can be run at important events within a document’s parsing

Page 15: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Core handler interfaces in SAX

org.xml.sax.ContentHandler org.xml.sax.ErrorHandler org.xml.sax.DTDHandler org.xml.sax.EntityResolver

Page 16: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Custom application classes

custom application classes that perform specific actions within the parsing process can implement each of the core interfaces

implementation classes can be registered with the parser with the methods setContentHandler(), etc.

Page 17: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Example: content handlers

class MyContentHandler implements ContentHandler {

public void startDocument() throws SAXException { System.out.println(”Parsing begins…”); }

public void endDocument() throws SAXException { System.out.println(”...Parsing ends.”); }

Page 18: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Element handlerspublic void startElement (String namespaceURI, String localName, String rawName, Attributes atts) throws SAXexception {

System.out.print(”startElement: ” + localName);if (!namespaceURI.equals(””)) { System.out.println(” in namespace ” + namespaceURI + ” (” + rawname + ”)”);} else { System.out.println(” has no associated namespace”); }

for (int I=0; I<atts.getLength(); I++) { System.out.println(” Attribute: ” + atts.getLocalName(I) + ”=” + atts.getValue(I)); }}

Page 19: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

endElement

public void endElement(String namespaceURI, String localName, String rawName)throws SAXException {

System.out.println(”endElement: ” + localName + ”\n”);}

Page 20: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Character datapublic void characters (char[] ch, int start, int end) throws SAXException {

String s = new String(ch, start, end); System.out.println(”characters: ” + s);}

parser may return all contiguous character data at once, or split the data up into multiple method invocations

Page 21: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Processing instructions

XML documents may contain processing instructions (PIs)

a processing instruction tells an application to perform some specific task

form: <?target instructions?>

Page 22: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Handlers for PIs

public void processingInstruction (String target, String data) throws SAXException {

System.out.println(”PI: Target:” + target + ” and Data:” + data);}

Application could receive instructions and set variables or execute methods to perform application-specific processing

Page 23: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Validation

some parsers are validating, some non-validating

some parsers can do both SAX method to turn validation on:

parser.setFeature (”http://xml.org/sax/features/validation”, true);

Page 24: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Ignorable whitespace validating parser can decide which

whitespace can be ignored for a non-validating parser, all

whitespace is just characters content handler:

public void ignorableWhitespace (char[] ch, int start, int end) { … }

Page 25: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Schema

DTDs have drawbacks:DTDs have drawbacks:– They can only define the element structure and attributes– They cannot define any database-like constraints for

elements:• Value (min, max, etc.)

• Type (integer, string, etc.)

– DTDs are not written in XML and cannot thus be processed with the same tools as XML documents, XSL(T), etc.

XML SchemaXML Schema: – Is written in XML– Avoids most of the DTD drawbacks

Page 26: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Schema

XML Schema Part 1: Structures:XML Schema Part 1: Structures:– Element structure definition as with DTD: Elements,

attributes, also enhanced ways to control structures

XML Schema Part 2: Datatypes:XML Schema Part 2: Datatypes:– Primitive datatypes (string, boolean, float, etc.)– Derived datatypes from primitive datatypes (time,

recurringDate)– Constraining facets for each datatype (minLength,

maxLength, pattern, precision, etc.)

Information about Schemas:Information about Schemas:– http://www.w3c.org/XML/Schema/

Page 27: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Complex and simple types

complex types: allow elements in their content and may have attributes

simple types: cannot have element content and cannot have attributes

Page 28: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Reminder: DTD declarations

<!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street,

(city, state, zipcode) | (zipcode, city))> <!ELEMENT contact

(address, phone*, email?)> <!ELEMENT contact2

(address | phone | email)*>

Page 29: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Example: USAddress type

<xsd:complexType name=”USAddress” > <xsd:sequence> <xsd:element name=”name” type=”xsd:string” /> <xsd:element name=”street” type=”xsd:string” /> <xsd:element name=”city” type=”xsd:string” /> <xsd:element name=”state” type=”xsd:string” /> <xsd:element name=”zip” type=”xsd:decimal” /> </xsd:sequence> <xsd:attribute name=”country” type=”xsd:NMTOKEN” use=”fixed” value=”US” /></xsd:complexType>

Page 30: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Example: PurchaseOrderType

<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:sequence> <xsd:attribute name=”orderDate” type=”xsd:date” /></xsd:complexType>

Page 31: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Notes

element declarations for shipTo and billTo associate different element names with the same complex type

attribute declarations must reference simple types

element comment declared elsewhere in the schema (here reference only)

Page 32: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

… continues element is optional, if minOccurs = 0 maximum number of times an element may

appear: maxOccurs attributes may appear once or not at all use attribute is used in an attribute

declaration to indicate whether the attribute is required or optional, and if optional, whether the value is fixed or whether there is a default

Page 33: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

More examples

…… <items><items> <item partNum="872-AA"><item partNum="872-AA"> <productName>Lawnmower</productName><productName>Lawnmower</productName> <quantity>1</quantity><quantity>1</quantity> <price>148.95</price><price>148.95</price> <comment>Confirm this is electric</comment><comment>Confirm this is electric</comment> </item></item> <item partNum="926-AA"><item partNum="926-AA"> <productName>Baby Monitor</productName><productName>Baby Monitor</productName> <quantity>1</quantity><quantity>1</quantity> <price>39.98</price><price>39.98</price> <shipDate>1999-05-21</shipDate><shipDate>1999-05-21</shipDate> </item></item> </items></items>… …

Page 34: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

<xsd:complexType name="Items"><xsd:complexType name="Items"> <xsd:element name="item" minOccurs="0”<xsd:element name="item" minOccurs="0” maxOccurs="unbounded">maxOccurs="unbounded"> <xsd:complexType><xsd:complexType> <xsd:element name="quantity"><xsd:element name="quantity"> <xsd:simpleType base="xsd:positiveInteger"><xsd:simpleType base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/><xsd:maxExclusive value="100"/> </xsd:simpleType></xsd:simpleType> </xsd:element></xsd:element> <xsd:element name="price" type="xsd:decimal"/><xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/><xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date”<xsd:element name="shipDate" type="xsd:date” minOccurs="0"/>minOccurs="0"/> <xsd:attribute name="partNum" type="Sku"/><xsd:attribute name="partNum" type="Sku"/> </xsd:complexType></xsd:complexType> </xsd:element></xsd:element></xsd:complexType></xsd:complexType><xsd:simpleType name=”Sku”><xsd:simpleType name=”Sku”> <xsd:pattern value="\d{3}-[A-Z]{2}"/><xsd:pattern value="\d{3}-[A-Z]{2}"/></xsd:simpleType></xsd:simpleType>

Page 35: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Patterns

<xsd:simpleType name=”Sku”><xsd:simpleType name=”Sku”> <xsd:restriction base=”xsd:string”><xsd:restriction base=”xsd:string”> <xsd:pattern value="\d{3}-[A-Z]{2}"/><xsd:pattern value="\d{3}-[A-Z]{2}"/> <xsd:restriction><xsd:restriction></xsd:simpleType></xsd:simpleType>

”three digits followed by a hyphen followed by two upper-case ASCII letters”

Page 36: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Building content models

<xsd:sequence>: fixed order <xsd:choice>: (1) choice of alternatives <xsd:group>: grouping (also named) <xsd:all>: no order specified

Page 37: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Null values

A missing element may mean many things: unknown, not applicable…

an attribute to indicate that the element content is null

in schema: <xsd:element name=”shipDate” type=”xsd:date” nullable=”true” />

in document: <shipDate xsi:null=”true”></shipDate>

Page 38: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Specifying uniqueness

XML Schema enables to indicate that any attribute or element value must be unique within a certain scope

unique element: first ”select” a set of elements, then identify the attribute of element ”field” relative to each selected element that has to be unique within the scope of the set of selected elements

Page 39: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

Defining keys and their references Also keys and key references can be

defined:

<key name=”pNumKey”> <selector>parts/part</selector> <field>@number</field></key>

<keyref name=”dummy2” refer=”pNumKey”> <selector>regions/zip/part</selector> <field>@number</field></keyref>

Page 40: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Query Languages

Currently:Currently:– There is no recommendation/standard available, only drafts– Different suggestions given in 1998, work in progress

XML Query Requirements:XML Query Requirements: – Requirements draft 16.8.2000– Query language until the end of 2000

XML Query Data Model:XML Query Data Model: – Draft 11.5.2000

More on XML Query Languages:More on XML Query Languages:– http://www.w3.org/XML/Query/

Page 41: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Query Languages Required features of an XML query language:Required features of an XML query language:

– Support operations (selection, projection, aggregation, Support operations (selection, projection, aggregation, sorting, etc.) on all data types: sorting, etc.) on all data types:

• Choose a part of the data based on content or structure

• Also operations on hierarchy and sequence of document structures

– Structural preservation and transformation:Structural preservation and transformation: • Preserve the relative hierarchy and sequence of input document

structures in the query results

• Transform XML structures and create new XML structures

– Combination and joining:Combination and joining:• Combine related information from different parts of a given

document or from multiple documents

Page 42: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Query Languages Required features of an XML query language (cont'd):Required features of an XML query language (cont'd):

– Closure property:Closure property: • The result of an XML document query is also an XML document

(usually not valid but well-formed)• The results of a query can be used as input to another query

Notions:Notions:– HTML is layout-oriented, queries can not be efficiently carried out– XML is not layout-oriented but is based on representing structure,

DTD’s and structure information can be used in queries– XML query languages are still under construction, but prototype

languages exist (e.g., XML-QL, XQL, Lore…)

Page 43: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Query Languages We want our query to collect elements from We want our query to collect elements from manufacturermanufacturer documents (in documents (in temp.database.xml) listing manufacturer's name, temp.database.xml) listing manufacturer's name, year, models, vendors, price, etc. to create new year, models, vendors, price, etc. to create new <car><car> elements elements– The results should list their make, model, vendor, rank, and

price (in this order)

Lorel:Lorel:

Select xml(car:(select X.vehicle.make, Select xml(car:(select X.vehicle.make, X.vehicle.model,X.vehicle.model, X.vehicle.vendor, X.manufacturer.rank,X.vehicle.vendor, X.manufacturer.rank, X.vehicle.price X.vehicle.price from temp.database.xml X))from temp.database.xml X))

Page 44: XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure

XML Query LanguagesWHERE WHERE <manufacturer><manufacturer> <mn_name>$mn</mn_name><mn_name>$mn</mn_name> <vehiclemodel><vehiclemodel> <model><model> <mo_name>$mon</mo_name><mo_name>$mon</mo_name> <rank>$r</rank><rank>$r</rank> </model></model> <vehicle><vehicle> <price>$y</price><price>$y</price> <vendor>$mn</vendor><vendor>$mn</vendor> </vehicle></vehicle> </vehiclemodel></vehiclemodel></manufacturer></manufacturer>IN www.nhcs\IN www.nhcs\temp.database.xmltemp.database.xml

CONSTRUCTCONSTRUCT<car><car> <make>$mn</make><make>$mn</make> <mo_name>$mon</mo_name><mo_name>$mon</mo_name> <vendor>$v</vendor><vendor>$v</vendor> <rank>$r</rank><rank>$r</rank> <price>$y</price><price>$y</price></car></car>

XML-QLXML-QL