xml for e-commerce ii helena ahonen-myka. xml processing model n xml processor is used to read xml...

Post on 27-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

XML for E-commerce II

Helena Ahonen-Myka

XML processing model

XML processor is used to read XML documents and provide access to their content and structure

XML processor works for some application

the specification defines which information the processor should provide to the application

Parsing

input: an XML document basic task: is the document well-

formed? Validating parsers additionally: is the

document valid?

Parsing

parsers produce data structures, which other tools and applications can use

two kind of APIs: tree-based and event-based

Tree-based API

compiles an XML document into an internal tree structure

allows an application to navigate the tree

Document Object Model (DOM) is a tree-based API for XML and HTML documents

Event-based API

reports parsing events (such as start and end of elements) directly to the application through callbacks

the application implements handlers to deal with the different events

Simple API for XML (SAX)

Example<?xml version=”1.0”> <doc> <para>Hello, world!</para> </doc>

Events:

start documentstart element: docstart element: paracharacters: Hello, world!end element: paraend element: doc

Example (cont.)

an application handles these events just as it would handle events from a graphical user interface (mouse clicks, etc) as the events occur

no need to cache the entire document in memory or secondary storage

Tree-based vs. event-based

tree-based APIs are useful for a wide range of applications, but they may need a lot of resources (if the document is large)

some applications may need to build their own tree structures, and it is very inefficient to build a parse tree only to map it to another tree

Tree-based vs. event-based

an event-based API is simpler, lower-level access to an XML document

as document is processed sequentially, one can parse documents much larger than the available system memory

own data structures can be constructed using own callback event handlers

We need a parser...

Apache Xerces: http://xml.apache.org IBM XML4J: http://alphaworks.ibm.com XP: http://www.jclark.com/xml/xp … many others

… and the SAX classes

http://www.megginson.com/SAX/ often the SAX classes come bundled to

the parser distribution some parsers only support SAX 1.0, the

latest version is 2.0

Starting a SAX parser

import org.xml.sax.XMLReader;

import org.apache.xerces.parsers.SAXParser;

XMLReader parser = new SAXParser();

parser.parse(uri);

Content handlers

In order to let the application do something useful with XML data as it is being parsed, we must register handlers with the SAX parser

handler is a set of callbacks: application code can be run at important events within a document’s parsing

Core handler interfaces in SAX

org.xml.sax.ContentHandler org.xml.sax.ErrorHandler org.xml.sax.DTDHandler org.xml.sax.EntityResolver

Custom application classes

custom application classes that perform specific actions within the parsing process can implement each of the core interfaces

implementation classes can be registered with the parser with the methods setContentHandler(), etc.

Example: content handlers

class MyContentHandler implements ContentHandler {

public void startDocument() throws SAXException { System.out.println(”Parsing begins…”); }

public void endDocument() throws SAXException { System.out.println(”...Parsing ends.”); }

Element handlerspublic void startElement (String namespaceURI, String localName, String rawName, Attributes atts) throws SAXexception {

System.out.print(”startElement: ” + localName);if (!namespaceURI.equals(””)) { System.out.println(” in namespace ” + namespaceURI + ” (” + rawname + ”)”);} else { System.out.println(” has no associated namespace”); }

for (int I=0; I<atts.getLength(); I++) { System.out.println(” Attribute: ” + atts.getLocalName(I) + ”=” + atts.getValue(I)); }}

endElement

public void endElement(String namespaceURI, String localName, String rawName)throws SAXException {

System.out.println(”endElement: ” + localName + ”\n”);}

Character datapublic void characters (char[] ch, int start, int end) throws SAXException {

String s = new String(ch, start, end); System.out.println(”characters: ” + s);}

parser may return all contiguous character data at once, or split the data up into multiple method invocations

Processing instructions

XML documents may contain processing instructions (PIs)

a processing instruction tells an application to perform some specific task

form: <?target instructions?>

Handlers for PIs

public void processingInstruction (String target, String data) throws SAXException {

System.out.println(”PI: Target:” + target + ” and Data:” + data);}

Application could receive instructions and set variables or execute methods to perform application-specific processing

Validation

some parsers are validating, some non-validating

some parsers can do both SAX method to turn validation on:

parser.setFeature (”http://xml.org/sax/features/validation”, true);

Ignorable whitespace validating parser can decide which

whitespace can be ignored for a non-validating parser, all

whitespace is just characters content handler:

public void ignorableWhitespace (char[] ch, int start, int end) { … }

XML Schema

DTDs have drawbacks:DTDs have drawbacks:– They can only define the element structure and attributes– They cannot define any database-like constraints for

elements:• Value (min, max, etc.)

• Type (integer, string, etc.)

– DTDs are not written in XML and cannot thus be processed with the same tools as XML documents, XSL(T), etc.

XML SchemaXML Schema: – Is written in XML– Avoids most of the DTD drawbacks

XML Schema

XML Schema Part 1: Structures:XML Schema Part 1: Structures:– Element structure definition as with DTD: Elements,

attributes, also enhanced ways to control structures

XML Schema Part 2: Datatypes:XML Schema Part 2: Datatypes:– Primitive datatypes (string, boolean, float, etc.)– Derived datatypes from primitive datatypes (time,

recurringDate)– Constraining facets for each datatype (minLength,

maxLength, pattern, precision, etc.)

Information about Schemas:Information about Schemas:– http://www.w3c.org/XML/Schema/

Complex and simple types

complex types: allow elements in their content and may have attributes

simple types: cannot have element content and cannot have attributes

Reminder: DTD declarations

<!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street,

(city, state, zipcode) | (zipcode, city))> <!ELEMENT contact

(address, phone*, email?)> <!ELEMENT contact2

(address | phone | email)*>

Example: USAddress type

<xsd:complexType name=”USAddress” > <xsd:sequence> <xsd:element name=”name” type=”xsd:string” /> <xsd:element name=”street” type=”xsd:string” /> <xsd:element name=”city” type=”xsd:string” /> <xsd:element name=”state” type=”xsd:string” /> <xsd:element name=”zip” type=”xsd:decimal” /> </xsd:sequence> <xsd:attribute name=”country” type=”xsd:NMTOKEN” use=”fixed” value=”US” /></xsd:complexType>

Example: PurchaseOrderType

<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:sequence> <xsd:attribute name=”orderDate” type=”xsd:date” /></xsd:complexType>

Notes

element declarations for shipTo and billTo associate different element names with the same complex type

attribute declarations must reference simple types

element comment declared elsewhere in the schema (here reference only)

… continues element is optional, if minOccurs = 0 maximum number of times an element may

appear: maxOccurs attributes may appear once or not at all use attribute is used in an attribute

declaration to indicate whether the attribute is required or optional, and if optional, whether the value is fixed or whether there is a default

More examples

…… <items><items> <item partNum="872-AA"><item partNum="872-AA"> <productName>Lawnmower</productName><productName>Lawnmower</productName> <quantity>1</quantity><quantity>1</quantity> <price>148.95</price><price>148.95</price> <comment>Confirm this is electric</comment><comment>Confirm this is electric</comment> </item></item> <item partNum="926-AA"><item partNum="926-AA"> <productName>Baby Monitor</productName><productName>Baby Monitor</productName> <quantity>1</quantity><quantity>1</quantity> <price>39.98</price><price>39.98</price> <shipDate>1999-05-21</shipDate><shipDate>1999-05-21</shipDate> </item></item> </items></items>… …

<xsd:complexType name="Items"><xsd:complexType name="Items"> <xsd:element name="item" minOccurs="0”<xsd:element name="item" minOccurs="0” maxOccurs="unbounded">maxOccurs="unbounded"> <xsd:complexType><xsd:complexType> <xsd:element name="quantity"><xsd:element name="quantity"> <xsd:simpleType base="xsd:positiveInteger"><xsd:simpleType base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/><xsd:maxExclusive value="100"/> </xsd:simpleType></xsd:simpleType> </xsd:element></xsd:element> <xsd:element name="price" type="xsd:decimal"/><xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/><xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date”<xsd:element name="shipDate" type="xsd:date” minOccurs="0"/>minOccurs="0"/> <xsd:attribute name="partNum" type="Sku"/><xsd:attribute name="partNum" type="Sku"/> </xsd:complexType></xsd:complexType> </xsd:element></xsd:element></xsd:complexType></xsd:complexType><xsd:simpleType name=”Sku”><xsd:simpleType name=”Sku”> <xsd:pattern value="\d{3}-[A-Z]{2}"/><xsd:pattern value="\d{3}-[A-Z]{2}"/></xsd:simpleType></xsd:simpleType>

Patterns

<xsd:simpleType name=”Sku”><xsd:simpleType name=”Sku”> <xsd:restriction base=”xsd:string”><xsd:restriction base=”xsd:string”> <xsd:pattern value="\d{3}-[A-Z]{2}"/><xsd:pattern value="\d{3}-[A-Z]{2}"/> <xsd:restriction><xsd:restriction></xsd:simpleType></xsd:simpleType>

”three digits followed by a hyphen followed by two upper-case ASCII letters”

Building content models

<xsd:sequence>: fixed order <xsd:choice>: (1) choice of alternatives <xsd:group>: grouping (also named) <xsd:all>: no order specified

Null values

A missing element may mean many things: unknown, not applicable…

an attribute to indicate that the element content is null

in schema: <xsd:element name=”shipDate” type=”xsd:date” nullable=”true” />

in document: <shipDate xsi:null=”true”></shipDate>

Specifying uniqueness

XML Schema enables to indicate that any attribute or element value must be unique within a certain scope

unique element: first ”select” a set of elements, then identify the attribute of element ”field” relative to each selected element that has to be unique within the scope of the set of selected elements

Defining keys and their references Also keys and key references can be

defined:

<key name=”pNumKey”> <selector>parts/part</selector> <field>@number</field></key>

<keyref name=”dummy2” refer=”pNumKey”> <selector>regions/zip/part</selector> <field>@number</field></keyref>

XML Query Languages

Currently:Currently:– There is no recommendation/standard available, only drafts– Different suggestions given in 1998, work in progress

XML Query Requirements:XML Query Requirements: – Requirements draft 16.8.2000– Query language until the end of 2000

XML Query Data Model:XML Query Data Model: – Draft 11.5.2000

More on XML Query Languages:More on XML Query Languages:– http://www.w3.org/XML/Query/

XML Query Languages Required features of an XML query language:Required features of an XML query language:

– Support operations (selection, projection, aggregation, Support operations (selection, projection, aggregation, sorting, etc.) on all data types: sorting, etc.) on all data types:

• Choose a part of the data based on content or structure

• Also operations on hierarchy and sequence of document structures

– Structural preservation and transformation:Structural preservation and transformation: • Preserve the relative hierarchy and sequence of input document

structures in the query results

• Transform XML structures and create new XML structures

– Combination and joining:Combination and joining:• Combine related information from different parts of a given

document or from multiple documents

XML Query Languages Required features of an XML query language (cont'd):Required features of an XML query language (cont'd):

– Closure property:Closure property: • The result of an XML document query is also an XML document

(usually not valid but well-formed)• The results of a query can be used as input to another query

Notions:Notions:– HTML is layout-oriented, queries can not be efficiently carried out– XML is not layout-oriented but is based on representing structure,

DTD’s and structure information can be used in queries– XML query languages are still under construction, but prototype

languages exist (e.g., XML-QL, XQL, Lore…)

XML Query Languages We want our query to collect elements from We want our query to collect elements from manufacturermanufacturer documents (in documents (in temp.database.xml) listing manufacturer's name, temp.database.xml) listing manufacturer's name, year, models, vendors, price, etc. to create new year, models, vendors, price, etc. to create new <car><car> elements elements– The results should list their make, model, vendor, rank, and

price (in this order)

Lorel:Lorel:

Select xml(car:(select X.vehicle.make, Select xml(car:(select X.vehicle.make, X.vehicle.model,X.vehicle.model, X.vehicle.vendor, X.manufacturer.rank,X.vehicle.vendor, X.manufacturer.rank, X.vehicle.price X.vehicle.price from temp.database.xml X))from temp.database.xml X))

XML Query LanguagesWHERE WHERE <manufacturer><manufacturer> <mn_name>$mn</mn_name><mn_name>$mn</mn_name> <vehiclemodel><vehiclemodel> <model><model> <mo_name>$mon</mo_name><mo_name>$mon</mo_name> <rank>$r</rank><rank>$r</rank> </model></model> <vehicle><vehicle> <price>$y</price><price>$y</price> <vendor>$mn</vendor><vendor>$mn</vendor> </vehicle></vehicle> </vehiclemodel></vehiclemodel></manufacturer></manufacturer>IN www.nhcs\IN www.nhcs\temp.database.xmltemp.database.xml

CONSTRUCTCONSTRUCT<car><car> <make>$mn</make><make>$mn</make> <mo_name>$mon</mo_name><mo_name>$mon</mo_name> <vendor>$v</vendor><vendor>$v</vendor> <rank>$r</rank><rank>$r</rank> <price>$y</price><price>$y</price></car></car>

XML-QLXML-QL

top related