3. xml processor apis

30
SDPL 2011 3: XML APIs and SAX 1 3. XML Processor APIs 3. XML Processor APIs How can (Java) applications How can (Java) applications manipulate structured (XML) manipulate structured (XML) documents? documents? An overview of XML processor An overview of XML processor interfaces interfaces 3.1 SAX: an event-based interface 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing 3.3 JAXP: Java API for XML Processing 3.4 StAX: Streaming API for XML 3.4 StAX: Streaming API for XML

Upload: thanh

Post on 20-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

3. XML Processor APIs. How can (Java) applications manipulate structured (XML) documents? An overview of XML processor interfaces 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing 3.4 StAX : Streaming API for XML. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 1

3. XML Processor APIs3. XML Processor APIs

How can (Java) applications manipulate How can (Java) applications manipulate structured (XML) documents?structured (XML) documents?– An overview of XML processor interfacesAn overview of XML processor interfaces

3.1 SAX: an event-based interface3.1 SAX: an event-based interface

3.2 DOM: an object-based interface3.2 DOM: an object-based interface

3.3 JAXP: Java API for XML Processing3.3 JAXP: Java API for XML Processing

3.4 StAX: Streaming API for XML3.4 StAX: Streaming API for XML

Page 2: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 2

Document Parser InterfacesDocument Parser Interfaces

Each XML application contains a parserEach XML application contains a parser– editors, browsers editors, browsers – transformation/style engines, DB loaders, ...transformation/style engines, DB loaders, ...

XML parsers have become standard tools of XML parsers have become standard tools of application development frameworksapplication development frameworks– Standard Java (since JDK 1.4) provides default Standard Java (since JDK 1.4) provides default

parsers (built on Apache Xerces) via JAXPparsers (built on Apache Xerces) via JAXP

(See, e.g., Leventhal, Lewis & Fuchs: Designing XML (See, e.g., Leventhal, Lewis & Fuchs: Designing XML Internet Applications, Chapter 10, and Internet Applications, Chapter 10, and D. Megginson: D. Megginson: Events vs. Trees))

Page 3: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 3

Tasks of a ParserTasks of a Parser

Document instance decompositionDocument instance decomposition– into into "XML Information Set""XML Information Set" of elements, attributes, of elements, attributes,

text, processing instructions, entities, ...text, processing instructions, entities, ... VerificationVerification

– well-formedness checking well-formedness checking » syntactical correctness of XML markupsyntactical correctness of XML markup

– validation (against a DTD or Schema)validation (against a DTD or Schema) Access to contents of the DTD (if supported)Access to contents of the DTD (if supported)

– SAX 2.0 Extensions provide info of declarations: SAX 2.0 Extensions provide info of declarations: element type names and their content model element type names and their content model expressionsexpressions

Page 4: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 4

Document Parser InterfacesDocument Parser Interfaces

I: Event-based (streaming) interfacesI: Event-based (streaming) interfaces– Command line and ESIS interfacesCommand line and ESIS interfaces

» Element Structure Information Set, traditional interface to Element Structure Information Set, traditional interface to stand-alone SGML parsersstand-alone SGML parsers

– Event call-back (or Push) interfaces: SAXEvent call-back (or Push) interfaces: SAX– Pull interfaces: StAXPull interfaces: StAX

II: Tree-based (object model) interfacesII: Tree-based (object model) interfaces– W3C DOM RecommendationW3C DOM Recommendation– Java-specific object models: JAXB, JDOM, dom4JJava-specific object models: JAXB, JDOM, dom4J

Page 5: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 5

Command-line ESIS interfaceCommand-line ESIS interface

ApplicationApplication

SGML/XML ParserSGML/XML Parser

CommandCommandline callline call

<E<E </E></E>Hi!Hi!i="1"i="1">>

ESISESISStreamStream

(E(EAi CDATA 1Ai CDATA 1

-Hi!-Hi!)E)E

"Loose "Loose coupling"coupling"

Page 6: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 6

Event Call-Back (Push) Event Call-Back (Push) InterfacesInterfaces

Application implements a set of Application implements a set of call-back methodscall-back methods for acting on parse eventsfor acting on parse events– "Observer design pattern""Observer design pattern"– parameters qualify events further, with parameters qualify events further, with

» element type nameelement type name

» names and values of attributesnames and values of attributes

» values of content strings, …values of content strings, …

Idea behind ‘‘Idea behind ‘‘SAXSAX’’ (Simple API for XML)’’ (Simple API for XML)– an industry standard API for XML parsersan industry standard API for XML parsers– kind-of “kind-of “SSerial erial AAccess ccess XXML”ML”

Page 7: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 7

Event Call-Back ApplicationEvent Call-Back Application

Application Main Application Main RoutineRoutine

startDocument()startDocument()

startElement(startElement(......))

characters(characters(......))

XMLReaderXMLReader.parse(.parse(......))

Callback

Callback

Routines

Routines

endElement(endElement(......)) <A i="1"><A i="1"> </A></A>Hi!Hi!

"A",[i="1"]"A",[i="1"]

"Hi!""Hi!"

"A""A"<?xml version='1.0'?><?xml version='1.0'?>

Page 8: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 8

Pull Parsing InterfacesPull Parsing Interfaces

The parser provides document contents as a The parser provides document contents as a stream of eventsstream of events, which the application can , which the application can iterateiterate over over– the application actively "the application actively "pullspulls" the events" the events

» in push processing, the application reacts on themin push processing, the application reacts on them

– leads often to simpler codeleads often to simpler code– allows to cancel the parsing more easily, or parse allows to cancel the parsing more easily, or parse

multiple documents simultaneouslymultiple documents simultaneously

Idea behind ‘‘Idea behind ‘‘StAXStAX’’ (Streaming API for XML)’’ (Streaming API for XML)– a standard Java API for XML parsersa standard Java API for XML parsers

Page 9: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX

A Pull-Parsing ApplicationA Pull-Parsing Application

ApplicationApplication EventReaderEventReader.nextEvent().nextEvent()

Parser APIParser API

<?xml version='1.0'?><?xml version='1.0'?>

StartDocumentStartDocument

Hi!Hi!

CharactersCharacters "Hi!""Hi!"

</A></A>EndElementEndElement "A""A" <A i="1"><A i="1">

StartElementStartElement

"A",[i="1"]"A",[i="1"]

9

Page 10: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 10

Tree-based / Object Model InterfacesTree-based / Object Model Interfaces

Application interacts with Application interacts with – a parser object, which builds ... a parser object, which builds ... – a a document objectdocument object consisting of consisting of documentdocument, ,

elements, attributes, textelements, attributes, text, …, … Abstraction level higher than in event based Abstraction level higher than in event based

interfaces; more powerful access interfaces; more powerful access – to descendants, following siblings, …to descendants, following siblings, …

Drawback: Higher memory consumptionDrawback: Higher memory consumption– > used mainly in client applications > used mainly in client applications

(to implement document manipulation by user)(to implement document manipulation by user)

Page 11: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 11

DOM-Based Application ConceptuallyDOM-Based Application Conceptually

ApplicationApplication

ParserParserObjectObject

In-Memory In-Memory Document Document

RepresentationRepresentationParseParse

Access/Access/ModifyModify

BuildBuild

DocumentDocument

i=1i=1AA

"Hi!""Hi!"

<A i="1"><A i="1"> </A></A>Hi!Hi!

Page 12: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 12

3.1 The SAX Event Callback API3.1 The SAX Event Callback API

A de-facto industry standardA de-facto industry standard– Developed by members of the xml-dev mailing listDeveloped by members of the xml-dev mailing list– Version 1.0 in May 1998, Vers. 2.0 in May 2000Version 1.0 in May 1998, Vers. 2.0 in May 2000– NotNot a parser, but a common a parser, but a common interfaceinterface for different for different

parsers (like, say, JDBC is a common interface to parsers (like, say, JDBC is a common interface to various RDBs)various RDBs)

Supported directly by major XML parsersSupported directly by major XML parsers– many Java based, and free; Examples:many Java based, and free; Examples:

Apache Xerces, Oracle's XML Parser for Java; Apache Xerces, Oracle's XML Parser for Java; MSXML (in IE 5+), James Clark's XPMSXML (in IE 5+), James Clark's XP

Page 13: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 13

SAX 2.0 InterfacesSAX 2.0 Interfaces

Co-operation of an application and a parser Co-operation of an application and a parser specified in terms of specified in terms of interfaces interfaces (i.e., (i.e., collections of methods)collections of methods)

My classificationMy classification of SAX interfaces: of SAX interfaces:– Application-to-parser interfacesApplication-to-parser interfaces

» to use the parserto use the parser

– Parser-to-application (or call-back) interfacesParser-to-application (or call-back) interfaces» to act on various parsing eventsto act on various parsing events

– Auxiliary interfacesAuxiliary interfaces» to manipulate parser-provided informationto manipulate parser-provided information

Page 14: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 14

Application-to-Parser InterfacesApplication-to-Parser Interfaces

Implemented by Implemented by parserparser (or a SAX driver): (or a SAX driver):

– XMLReaderXMLReader» methods to invoke the parser, and to register methods to invoke the parser, and to register

objects that implement call-back interfacesobjects that implement call-back interfaces

– XMLFilterXMLFilter (extends (extends XMLReaderXMLReader))» interface to connect interface to connect XMLReaderXMLReaders in a row as a s in a row as a

sequence of filterssequence of filters» obtains events from an obtains events from an XMLReaderXMLReader and passes and passes

them further (possibly modified)them further (possibly modified)

Page 15: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 15

Call-Back InterfacesCall-Back Interfaces

Implemented by Implemented by applicationapplication to act on parse events to act on parse events(A (A DefaultHandlerDefaultHandler quietly ignores most of them) quietly ignores most of them)– ContentHandlerContentHandler

» methods to process document parsing eventsmethods to process document parsing events– DTDHandlerDTDHandler

» methods to receive notification of unparsed external methods to receive notification of unparsed external entities and notations declared in the DTDentities and notations declared in the DTD

– ErrorHandlerErrorHandler» methods for handling parsing errors and warningsmethods for handling parsing errors and warnings

– EntityResolverEntityResolver» methods for customised processing of external methods for customised processing of external

entity references entity references

Page 16: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 16

SAX 2.0: Auxiliary InterfacesSAX 2.0: Auxiliary Interfaces

AttributesAttributes– methods to access a list of attributes, e.g:methods to access a list of attributes, e.g:

int int getLength()getLength()String String getValue(String attrName)getValue(String attrName)

LocatorLocator– methods for locating the origin of parse events: methods for locating the origin of parse events: getSystemID()getSystemID(), , getPublicID()getPublicID(), , getLineNumber()getLineNumber(), , getColumnNumber()getColumnNumber()

– for example, to report application-specific for example, to report application-specific semantic errors semantic errors

Page 17: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 17

The The ContentHandlerContentHandler Interface Interface

Information of general document events. (See API Information of general document events. (See API documentation for a complete list):documentation for a complete list):

setDocumentLocator(Locator locator)setDocumentLocator(Locator locator) – to receive a locator for the origin of SAX document eventsto receive a locator for the origin of SAX document events

startDocument()startDocument();; endDocument()endDocument() – notify the beginning/end of a document. notify the beginning/end of a document.

startElement(String nsURI, startElement(String nsURI, String localName, String localName,

String rawName,String rawName,Attributes atts)Attributes atts);;

endElement( … )endElement( … ); ; as above but without attributesas above but without attributes

for for namespace namespace

supportsupport

Page 18: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 18

Namespaces in SAX: ExampleNamespaces in SAX: Example

<xsl:stylesheet version="1.0" <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> xmlns="http://www.w3.org/TR/xhtml1/strict"> <xsl:template match="/"> <xsl:template match="/"> <html> <html>

<xsl:value-of select="//total"/><xsl:value-of select="//total"/> </html> </html> </xsl:template> </xsl:template>

</xsl:stylesheet></xsl:stylesheet>

startElementstartElement for this would pass following parameters: for this would pass following parameters:

– nsURI= nsURI= http://www.w3.org/1999/XSL/Transformhttp://www.w3.org/1999/XSL/Transform

– localname = localname = template, template, rawName =rawName = xsl:template xsl:template

Page 19: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 19

Namespaces: Example (2)Namespaces: Example (2)

<xsl:stylesheet version="1.0" ... <xsl:stylesheet version="1.0" ... xmlns="http://www.w3.org/TR/xhtml1/strict"> xmlns="http://www.w3.org/TR/xhtml1/strict"> <xsl:template match="/"> <xsl:template match="/"> <html> ... </html> <html> ... </html> </xsl:template> </xsl:template>

</xsl:stylesheet></xsl:stylesheet>

endElementendElement for for htmlhtml would give would give– nsURI =nsURI = http://www.w3.org/TR/xhtml1/strict http://www.w3.org/TR/xhtml1/strict

(as default namespace for element names without a (as default namespace for element names without a prefix),prefix), localname =localname = html html, , rawName =rawName = html html

Page 20: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 20

<!DOCTYPE A [<!ELEMENT A (B)> <!DOCTYPE A [<!ELEMENT A (B)> <!ELEMENT B (#PCDATA)> ]> <!ELEMENT B (#PCDATA)> ]>

<A><A><B> <B> </B></A></B></A>

ContentHandlerContentHandler interface (cont.) interface (cont.)

characters(char ch[], characters(char ch[], int start, int length)int start, int length)

– notification of character data notification of character data ignorableWhitespace(char ch[], ignorableWhitespace(char ch[],

int start, int length)int start, int length)– notification of ignorable whitespace in element content notification of ignorable whitespace in element content

Ignorable whitespaceIgnorable whitespace

Text contentText content

Page 21: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 21

SAX Processing Example (1/9)SAX Processing Example (1/9)

InputInput: XML representation of a personnel database:: XML representation of a personnel database:

<?xml version="1.0" encoding="ISO-8859-1"?><?xml version="1.0" encoding="ISO-8859-1"?>

<db><db><person idnum="1234"><person idnum="1234"><last>Kilpeläinen</last><first>Pekka</first></person><last>Kilpeläinen</last><first>Pekka</first></person>

<person idnum="5678"><person idnum="5678">

<last>Möttönen</last><first>Matti</first></person><last>Möttönen</last><first>Matti</first></person>

<person idnum="9012"><person idnum="9012"><last>Möttönen</last><first>Maija</first></person><last>Möttönen</last><first>Maija</first></person>

<person idnum="3456"><person idnum="3456"><last>Römppänen</last><first>Maija</first></person><last>Römppänen</last><first>Maija</first></person>

</db></db>

Page 22: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 22

SAX Processing Example (2/9)SAX Processing Example (2/9)

TaskTask: Format the document as a list like this:: Format the document as a list like this:

Pekka Kilpeläinen (1234)Pekka Kilpeläinen (1234)Matti Möttönen (5678)Matti Möttönen (5678)Maija Möttönen (9012)Maija Möttönen (9012)Maija Römppänen (3456)Maija Römppänen (3456)

Event-based processing strategy:Event-based processing strategy:– at the start of at the start of personperson, record the , record the idnumidnum (e.g.,(e.g., 12341234))

– record starts and ends of record starts and ends of lastlast and and firstfirst to store to store their contents (e.g., "their contents (e.g., "KilpeläinenKilpeläinen" and "" and "PekkaPekka")")

– at the end of a at the end of a personperson, output the collected data, output the collected data

Page 23: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 23

SAX Processing Example (3/9)SAX Processing Example (3/9)

ApplicationApplication: : First import relevant interfaces & classes:First import relevant interfaces & classes:

importimport org.xml.sax.XMLReader; org.xml.sax.XMLReader;importimport org.xml.sax.Attributes; org.xml.sax.Attributes;importimport org.xml.sax.ContentHandler; org.xml.sax.ContentHandler;

//Default (no-op) implementation of//Default (no-op) implementation of//interface ContentHandler://interface ContentHandler:importimport org.xml.sax.helpers.DefaultHandler; org.xml.sax.helpers.DefaultHandler;

// JAXP to instantiate a parser:// JAXP to instantiate a parser:importimport javax.xml.parsers.*;javax.xml.parsers.*;

Page 24: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 24

SAX Processing Example (4/9)SAX Processing Example (4/9)

Implement relevant call-back methods:Implement relevant call-back methods:

public class SAXDBApp extends public class SAXDBApp extends DefaultHandlerDefaultHandler{{

// Flags to remember element context:// Flags to remember element context:

private boolean InFirst = false, private boolean InFirst = false, InLast = false; InLast = false;

// Storage for element contents and // Storage for element contents and

// attribute values:// attribute values:

private String FirstName, LastName, IdNum;private String FirstName, LastName, IdNum;

Page 25: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 25

SAX Processing Example (5/9)SAX Processing Example (5/9)

Call-back methods:Call-back methods:– record the start of record the start of firstfirst and and lastlast elements, elements,

and the and the idnumidnum attribute of a attribute of a personperson::public void public void startElementstartElement ( (

String nsURI, String localName,String nsURI, String localName, String rawName, String rawName, AttributesAttributes atts) { atts) {

if (rawName.equals("person")) if (rawName.equals("person")) IdNum = atts.IdNum = atts.getValuegetValue("idnum");("idnum");

if (rawName.equals("first")) if (rawName.equals("first")) InFirst = true;InFirst = true;

if (rawName.equals("last")) if (rawName.equals("last")) InLast = true;InLast = true;

} // startElement} // startElement

Page 26: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 26

SAX Processing Example (6/9)SAX Processing Example (6/9)

Call-back methods continue:Call-back methods continue:– Record the text content of elements Record the text content of elements firstfirst and and lastlast::

public void public void characterscharacters ( (char buf[], int start, int length) {char buf[], int start, int length) {

if (InFirst) FirstName = if (InFirst) FirstName = new String(buf, start, length);new String(buf, start, length);

if (InLast) LastName = if (InLast) LastName = new String(buf, start, length); new String(buf, start, length);

} // characters } // characters

Page 27: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 27

SAX Processing Example (7/9)SAX Processing Example (7/9)

At the end of At the end of personperson, output the collected data:, output the collected data:

public void public void endElementendElement(String nsURI,(String nsURI, String localName, String qName) { String localName, String qName) {

if (qName.equals("person")) { if (qName.equals("person")) {

System.out.println(FirstName + " " + System.out.println(FirstName + " " + LastName + " (" + IdNum + LastName + " (" + IdNum +

")" ); ")" ); InFirst = false; }InFirst = false; }

//Update context flag://Update context flag:

if (qName.equals("last")) InLast = false;if (qName.equals("last")) InLast = false;

} // endElement()} // endElement()

Page 28: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 28

SAX Processing Example (8/9)SAX Processing Example (8/9)

Application Application mainmain method: method:

public static void main (String args[]) {public static void main (String args[]) {

// Instantiate an XMLReader (from JAXP // Instantiate an XMLReader (from JAXP

// SAXParserFactory):// SAXParserFactory):

SAXParserFactory spf =SAXParserFactory spf =SAXParserFactory.newInstance();SAXParserFactory.newInstance();

try {try {

SAXParser saxParser = spf.newSAXParser();SAXParser saxParser = spf.newSAXParser();

XMLReaderXMLReader xmlReader = xmlReader =saxParser.getXMLReader();saxParser.getXMLReader();

Page 29: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 29

SAX Processing Example (9/9)SAX Processing Example (9/9)

MainMain method continues: method continues:// Instantiate and pass a new // Instantiate and pass a new // ContentHandler to xmlReader:// ContentHandler to xmlReader:

ContentHandlerContentHandler handler = new SAXDBApp(); handler = new SAXDBApp(); xmlReader.xmlReader.setContentHandlersetContentHandler(handler);(handler); for (int i = 0; i < args.length; i++) {for (int i = 0; i < args.length; i++) { xmlReader.xmlReader.parseparse(args[i]);(args[i]); }}

} catch (Exception e) {} catch (Exception e) {System.err.println(e.getMessage());System.err.println(e.getMessage());

System.exit(1); }System.exit(1); }} // main} // main

Page 30: 3. XML Processor APIs

SDPL 2011 3: XML APIs and SAX 30

SAX: SummarySAX: Summary

A low-level parser-interface for XML A low-level parser-interface for XML documentsdocuments

Reports document parsing events Reports document parsing events through method call-backsthrough method call-backs– > efficient: does not create in-memory > efficient: does not create in-memory

representation of the documentrepresentation of the document– > used often on servers and on resource-> used often on servers and on resource-

limited devices (palm-tops), and to process limited devices (palm-tops), and to process LARGE documentsLARGE documents

– No limit on document size!No limit on document size!