comparing java xml parsers ver1 0

Upload: javatest

Post on 07-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    1/23

    COMPARING JAVA XML PARSERS

    PRESENTED BY

    SASANKA SEKHAR BANERJEE

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    2/23

    COMPARING JAVA XML PARSERS

    During this presentation, we will discuss the following:

    Need for XML

    Brief overview of XMLDifferent methods of parsing XMLDOM [Document Object Model]SAX [Simple API for XML]JAXP [Java API for XML processing]JAXB [Java API for XML Binding]StAX [Streaming API for XML]XPathChoose the right parser

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    3/23

    COMPARING JAVA XML PARSERS NEED FOR XML

    Applications essentially consist of two parts - functionality described by the code and the data that ismanipulated by the code.

    The in-memory storage and management of data is a key part of any programming language andenvironment.

    Within a single application, the programmer is free to decide how the data is stored and represented.

    Problem - Application must exchange data with another application . Can use an intermediary storage medium, such as a database.

    But what if the data is to be exchanged directly between two applications, or the applications cannotaccess the same database?

    In this case, the data must be encoded in some particular format as it is produced.

    This has often resulted in the creation of application-specific data formats.

    These formats can be text-based, such as HTML for encoding how to display the encapsulated data, orbinary, such as those used for sending remote procedure calls.

    Problem - In either case, there tends to be a lack of flexibility in the data representation,causing problems when versions change or when data needs to be exchanged between disparate

    applications, frequently from different vendors.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    4/23

    COMPARING JAVA XML PARSERS XML USAGE

    XML was developed to address these issues. XML is written in plain text, uses self-describing elementsand provides a data encoding format that is:

    GenericSimpleFlexibleExtensiblePortable

    XML offers a method of putting structured data in a text file. Structured data conforms to a particularformat; examples are spreadsheets, address books, configuration parameters, and financial transactions.

    This plain text data provides software- and hardware-independent way of storing data making it easierto create data that different applications can share.

    Exchanging data as XML greatly reduces this complexity, since the data can be read by differentincompatible applications.

    While upgrading to a new systems large volume of data must be converted and incompatible data isoften lost. XML plain text format. This makes it easier to expand or upgrade to new systems, withoutlosing data.

    With XML, data can be available to all kinds of "reading machines" (Handheld computers, voicemachines, news feeds, etc)

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    5/23

    COMPARING JAVA XML PARSERS OVERVIEW OF XML

    XML document consists of elements, each element has a start tag, content and an end tag.

    XML document must have exactly one root element, e.g. one tag which encloses the remaining tags.

    XML document is case-sensitive and required to be well-formatted.

    Following conditions need to satisfied in order to be well-formatted:

    A XML document always starts with a prologEvery tag has a closing tag.All tags are completely nested.

    XML document is valid if it is well-formatted and if it is contains a link to a XML schema and is validaccording to the schema.

    The following is a valid, well-formatted XML file

    Lars Test

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    6/23

    COMPARING JAVA XML PARSERS PARSING XML

    Java contains several methods to access XML. The following is a short overview of the available methods.Document Object Model or DOM

    Defines a mechanism for accessing and manipulating well-formed XML.Using the DOM API, the XML document is transformed into a tree structure in memory.The application then navigates the tree to parse the document.If the document is large, it can place a strain on system resources.

    Simple API For XML or SAXDefines XML parsing methods.Event based parser, the SAX parser streams a series of events while it reads the document.These events are forwarded to event handlers, which also provide access to the data of the document.Consumes extremely low memory, XML is not required to be loaded into the memory at one time.Need to implement all the event handlers to handle each and every incoming event.

    Incapable of processing the events when it comes to the DOM's element supports, and need to keeptrack of the parsers position in the document hierarchy.

    The application logic gets tougher as the document gets complicated and bigger.

    It may not be required that the entire document be loaded but a SAX parser still requires to parse thewhole document, similar to the DOM.

    It lacks a built-in document support for navigation like the one which is provided by XPath.

    Along with the existing problem the one-pass parsing syndrome also limits the random access support.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    7/23

    COMPARING JAVA XML PARSERS PARSING XML

    Java API for XML Processing or JAXPIt provides a common interface for creating and using SAX and DOM in Java.

    It does not implement a parser in itself, but defines the behavior that a parser is (at least) to support.

    The actual parser itself will have to derive these classes and provide concrete classes.

    It uses FACTORY pattern to create a concrete class and then call methods on these to parse.

    DocumentBuilderFactory class is used for DOM Parsing and SAXParserFactory is used for SAX parsing.

    Traversing the DOM using JAXP:Instantiate a factory class.Using the factory class instantiate the provider class.

    Using the provider class created in the previous step perform the XML processing/parsing

    DocumentBuilderFactoty factoryBuilder = DocumentBuilderFactory.newInstance( );DocumentBuilder builder = factoryBuilder.newDocumentBuilder();Document doc = builder.parse( fileName );

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    8/23

    COMPARING JAVA XML PARSERS PARSING XML

    SAX Parsing using JAXP

    In the case of DOM parser, responsibility was passed to the actual parser to parse the XML document andreturn the DOM document object.

    But for SAX, the approach is quite opposite. We call the parse method and pass a handler object thishandler will receive notifications about the parsing progress, errors encountered and so on.

    SAXParserFactory factorySAX = SAXParserFactory.newInstance();SAXParser sax = factorySAX.newSAXParser();DefaultHandler handler = new XMLParser();sax.parse(inputStream, handler);

    The only major difference is the parse function first, the parse function doesnt return aDocument object and, secondly, we need to specify a DefaultHandler-derived class.The handler class is meant to build up the DOM internally, should it need to.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    9/23

    COMPARING JAVA XML PARSERS PARSING XML

    Java API For XML Binding or JAXB

    DOM is a useful API that build and transform XML documents in memory. Unfortunately, DOM issomewhat slow and resource hungry. To address these problems, the Java Architecture for XML Binding(JAXB) has been developed.

    JAXB provides a mechanism that simplifies the creation and maintenance of XML-enabled Java

    applications. It does this by using an XML schema compiler (only DTDs and a subset of XML schemas andnamespaces at the time of this writing) that translates XML DTDs into one or more Java classes, therebyremoving the burden from the developer to write complex parsing code.

    The generated classes handle all the details of XML parsing and formatting, including code to performerror and validity checking of incoming and outgoing XML documents, which ensures that only valid,error-free XML is accepted.

    Because the code has been generated for a specific schema, the generated classes are more efficientthan those in a generic SAX or DOM parser. Most important, a JAXB parser often requires a much smallerfootprint in memory than a generic parser.

    Classes created with JAXB do not include tree-manipulation capability, which is one factor thatcontributes to the small memory footprint of a JAXB object tree.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    10/23

    COMPARING JAVA XML PARSERS PARSING XML

    JAXB primarily contains at the two main components:The binding compiler, which binds a given XML schema to a set of generated Java classesThe binding runtime framework, which provides unmarshalling, marshalling, and validationfunctionalities.

    Unmarshalling a XML documentUnmarshalling is the process of converting an XML document into a corresponding set of Java objects.

    First step is to create a JAXBContext context object which is the starting point for marshalling,unmarshalling, and validation operations.JAXBContext jaxbContext = JAXBContext.newInstance (com.xmlparsers.jaxb.xsd.marketerprofile");

    To unmarshall an XML document, create an Unmarshaller from the context:Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();

    The unmarshaller returns the unmarshalled object:CreateCustomerProfileResponse profileElement = (CreateCustomerProfileResponse)unmarshaller.unmarshal(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));

    String marketerProfile = profileElement.getCustomerProfileId();

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    11/23

    COMPARING JAVA XML PARSERS PARSING XML

    Marshalling a XML documentMarshalling involves transforming Java classes into XML format.

    MessageType msgType = new MessageType();msgType.setCode("0");msgType.setText("Successfull");

    MessagesType msgTypes = new MessagesType();msgTypes.setResultCode("OK");msgTypes.getMessageType().add(msgType);

    CreateCustomerProfileResponse marketerProfile = new CreateCustomerProfileResponse();marketerProfile.getMessagesType().add(msgTypes);

    marketerProfile.setCustomerProfileId("21345678");

    JAXBContext context = JAXBContext. newInstance(CreateCustomerProfileResponse.class);Marshaller m = context.createMarshaller();m.setProperty(Marshaller. JAXB_FORMATTED_OUTPUT, Boolean.TRUE);m.marshal(marketerProfile, System. out);

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    12/23

    COMPARING JAVA XML PARSERS PARSING XML

    Use JAXB when you want toAccess data in memory, but do not need tree manipulation capabilitiesProcess only data that is validConvert data to different typesGenerate classes based on a DTD or XML schemaBuild object representations of XML data

    Use JAXP when you want toHave flexibility with regard to the way you access the data, either serially with SAX or randomly inmemory with DOMUse your same processing code with documents based on different DTDsParse documents that are not necessarily valid

    Apply XSLT transformationsInsert or remove components from an in-memory XML tree

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    13/23

    COMPARING JAVA XML PARSERS PARSING XML

    Streaming API For XML or StAX

    Traditionally, XML APIs are either:Tree based - the entire document is read into memory as a tree structure for random access by thecalling applicationEvent based - the application registers to receive events as entities are encountered within thesource document.

    Tree based API are less efficient with respect to the memory usage.In such situations, a streaming API is preferred which uses much less memory since it doesn't have to holdthe entire document in memory simultaneously.

    It can process the document in small pieces making it much faster.

    SAX is one such event based streaming API which actually pushes data into the application.

    They feed the content of the document to the application as soon as they see it, whether the application

    is ready to receive that data or not.StAX was designed as a median between these two opposites. The programmatic entry point is a cursorthat represents a point within the document. The application moves the cursor forward - ' pulling ' theinformation from the parser as it needs.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    14/23

    COMPARING JAVA XML PARSERS PARSING XML

    Pull API has the following advantages:

    Pull APIs are a more comfortable alternative for streaming processing of XML.

    A Pull API is based around the more familiar Iterator design pattern rather than the less well-knownobserver design pattern .

    In a Pull API, the client program asks the parser for the next piece of information rather than the parser

    telling the client program when the next datum is available.In a Pull API the client program drives the parser whereas in a Push API the parser drives the client.

    Why StAX ?

    StAX shares with SAX the ability to read arbitrarily large documents.

    However, in StAX the application is in control rather than the parser.

    The application tells the parser when it wants to receive the next data chunk rather than the parser

    StAX exceeds SAX by allowing programs to both read existing XML documents and create new ones.

    Unlike SAX, StAX is a bidirectional API.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    15/23

    COMPARING JAVA XML PARSERS PARSING XML

    Reading XML with StAX:

    XMLStreamReader is the key interface in StAX.

    This interface represents a cursor that's moved across an XML document from beginning to end.

    At any given time, this cursor points at one event: text node, start-tag, comment, etc.

    The cursor always moves forward, never backward, and normally only moves one item at a time.

    Methods like getName and getText can be invoked to retrieve information.

    A typical StAX program begins by using the XMLInputFactory class to load an implementationdependent instance of XMLStreamReader.

    InputStream in = new FileInputStream(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));XMLInputFactory factory = XMLInputFactory. newInstance();XMLStreamReader staxParser = factory.createXMLStreamReader(in);

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    16/23

    COMPARING JAVA XML PARSERS PARSING XML while (staxParser.hasNext()){

    int event = staxParser.next();if (event == XMLStreamConstants. END_DOCUMENT) { staxParser.close();

    break;}

    if (event == XMLStreamConstants. START_ELEMENT) { System. out.println(staxParser.getLocalName());}

    }

    The advantage of StAX parsing over SAX parsing is that a parse event may be skipped by invoking thenext() method as shown in the following code.

    For example, if the parse event is of type START_ELEMENT, a developer may determine if the eventinformation is to be obtained or the next event is to be retrieved:

    if (event == XMLStreamConstants. START_ELEMENT){

    System. out.println(staxParser.getLocalName());}

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    17/23

    COMPARING JAVA XML PARSERS PARSING XML

    Writing with StAX// XMLStreamWriter will be obtained from an XMLOutputFactoryXMLOutputFactory outputFactory= XMLOutputFactory. newInstance();XMLStreamWriter XMLStreamWriter= outputFactory.createXMLStreamWriter(System. out);

    // create a document start with the writeStartDocument() methodXMLStreamWriter.writeStartDocument("UTF-8","1.0");XMLStreamWriter.writeComment("Testing with StAX ");

    // Output the start of the 'catalog' element using writeStartElement() methodXMLStreamWriter.writeStartElement("createCustomerProfileResponse");XMLStreamWriter.writeNamespace("xsi","http://www.w3.org/2001/XMLSchema-instance");

    XMLStreamWriter.writeStartElement("messages");XMLStreamWriter.writeStartElement("resultCode");XMLStreamWriter.writeCharacters("Ok");XMLStreamWriter.writeEndElement();

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    18/23

    COMPARING JAVA XML PARSERS PARSING XML

    Writing with StAX . contd XMLStreamWriter.writeStartElement("message");XMLStreamWriter.writeStartElement("code");XMLStreamWriter.writeCharacters("I00001");XMLStreamWriter.writeEndElement();

    XMLStreamWriter.writeStartElement("text");XMLStreamWriter.writeCharacters("Successful");XMLStreamWriter.writeEndElement();

    XMLStreamWriter.writeEndElement();XMLStreamWriter.writeStartElement("customerProfileId");

    XMLStreamWriter.writeCharacters("1103042");XMLStreamWriter.writeEndElement();

    XMLStreamWriter.writeEndElement();XMLStreamWriter.flush();XMLStreamWriter.close();

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    19/23

    COMPARING JAVA XML PARSERS PARSING XML

    XPATH

    XPath is a language for addressing parts of an XML document.

    XPath, XML Path Language, is an expression language for addressing portions of an XML document ornavigating within an XML document.

    XPath is really helpful for parsing XML- based configuration or properties files.

    XPath uses path expressions to select nodes or node-sets in an XML document.

    These path expressions look very much like URL and traditional file system paths.

    XPath also supports several functions for string manipulation, comparison and others.

    XML documents are treated as trees of nodes and the root is called the document or root node.

    There are about seven different kinds of nodes.

    They are element, attribute, text, namespace, processing-instruction, comment, and root nodes.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    20/23

    COMPARING JAVA XML PARSERS PARSING XML

    XPATHLet us consider the following XML sample:

    Ok

    I00001Successful.

    1103042

    The root node is < createCustomerProfileResponse >. and are the two Elements.The < resultCode > node is a child of the < messages > element.The resultCode value Ok is a text node.

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    21/23

    COMPARING JAVA XML PARSERS PARSING XML

    XPATH Path Expression syntax

    Expression Descriptionnodename Selects all child nodes of the named node

    / Selects from root node

    // Selects nodes from the current node that match the selection no matterwhere they are

    . Selects the current node

    .. Selects the parent of the current node

    @ Selects attributes

    * Matches any element node

    @* Matches any attribute nodes

    node() Matches any node of any kind

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    22/23

    COMPARING JAVA XML PARSERS PARSING XML

    XPATH Reading XML

    InputStream resultStream = new FileInputStream(newFile("src/com/xmlparsers/jaxb/xsd/CIMMarketerProfile.xml"));

    java.io.BufferedReader aReader = new java.io.BufferedReader(newjava.io.InputStreamReader(resultStream, "UTF8"));

    StringBuffer aResponse = new StringBuffer();

    String aLine = aReader.readLine();while(aLine != null) {aResponse.append(aLine);aLine = aReader.readLine();}resultStream.close();

    if (aResponse.length() > 0 && (int) aResponse.charAt(0) == 0xFEFF) {aResponse.deleteCharAt(0);}

  • 8/4/2019 Comparing Java XML Parsers Ver1 0

    23/23

    COMPARING JAVA XML PARSERS PARSING XML

    XPATH Reading XML

    javax.xml.parsers.DocumentBuilder docBuilder =javax.xml.parsers.DocumentBuilderFactory. newInstance().newDocumentBuilder();

    java.io.StringReader stringReader = new java.io.StringReader(aResponse.toString());org.w3c.dom.Document doc = docBuilder.parse(new org.xml.sax.InputSource(stringReader));

    javax.xml.xpath.XPath xpath = javax.xml.xpath.XPathFactory. newInstance().newXPath();String customerProfileId = xpath.evaluate("/*/customerProfileId/text()", doc);