![Page 1: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/1.jpg)
1
Processing XML with Java
CS 236607, Spring 2008/9
![Page 2: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/2.jpg)
Parsers What is a parser?
Parser
Formal grammar
Input AnalyzedData
The structure(s) of the input, according to the atomic elements
and their relationships (as described in the grammar)
![Page 3: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/3.jpg)
3
XML-Parsing Standards
We will consider two parsing methods that implement W3C standards for accessing XML
DOM convert XML into a tree of objects “random access” protocol
SAX “serial access” protocol event-driven parsing
![Page 4: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/4.jpg)
4
XML Examples
![Page 5: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/5.jpg)
<?xml version="1.0"?>
<!DOCTYPE countries SYSTEM "world.dtd">
<countries>
<country continent="&as;">
<name>Israel</name>
<population year="2001">6,199,008</population>
<city capital="yes"><name>Jerusalem</name></city>
<city><name>Ashdod</name></city>
</country>
<country continent="&eu;">
<name>France</name>
<population year="2004">60,424,213</population>
</country>
</countries>
world.xmlworld.xmlroot element
validating DTD file
reference to an entity
![Page 6: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/6.jpg)
countries
country
Asia
continent
Israel
name
year
2001
6,199,008
population
city
capital
yes
name
Jerusalem
country
Europe
continent
France
nameyear
2004
60,424,213
population
city
capital
no
name
Ashdod
XML Tree Modelelement
element
attribute simple content
![Page 7: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/7.jpg)
<!ELEMENT countries (country*)> <!ELEMENT country (name,population?,city*)> <!ATTLIST country continent CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT city (name)><!ATTLIST city capital (yes|no) "no"> <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #IMPLIED> <!ENTITY eu "Europe"> <!ENTITY as "Asia"><!ENTITY af "Africa"><!ENTITY am "America"><!ENTITY au "Australia">
world.dtdworld.dtd
default value
As opposed to
required
parsedNot parsed
![Page 8: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/8.jpg)
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:em>DBI:</xhtml:em>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
“xhtml” namespace declaration
default namespace declaration
namespace overriding
(non-parsed) character data
Namespaces
![Page 9: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/9.jpg)
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1> DBI </xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “http://www.w3.org/1999/xhtml”
Local name: “h1”
Qualified name: “xhtml:h1”
![Page 10: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/10.jpg)
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1>DBI</xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “”
Local name: “title”
Qualified name: “title”
![Page 11: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/11.jpg)
11
DOM – Document Object Model
![Page 12: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/12.jpg)
12
DOM Parser
DOM = Document Object Model Parser creates a tree object out of the document User accesses data by traversing the tree
The tree and its traversal conform to a W3C standard The API allows for constructing, accessing and
manipulating the structure and content of XML documents
![Page 13: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/13.jpg)
<?xml version="1.0"?>
<!DOCTYPE countries SYSTEM "world.dtd">
<countries>
<country continent="&as;">
<name>Israel</name>
<population year="2001">6,199,008</population>
<city capital="yes"><name>Jerusalem</name></city>
<city><name>Ashdod</name></city>
</country>
<country continent="&eu;">
<name>France</name>
<population year="2004">60,424,213</population>
</country>
</countries>
![Page 14: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/14.jpg)
The DOM TreeDocument
countries
country
Asia
continent
Israel
name
year
2001
6,199,008
population
city
capital
yes
name
Jerusalem
country
Europe
continent
France
nameyear
2004
60,424,213
population
city
capital
no
name
Ashdod
![Page 15: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/15.jpg)
15
Using a DOM Tree
DOM Parser DOM TreeXML File
API
Application
in memory
![Page 16: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/16.jpg)
16
![Page 17: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/17.jpg)
17
Creating a DOM Tree A DOM tree is generated by a DocumentBuilder
The builder is generated by a factory, in order to be
implementation independent
The factory is chosen according to the system
configuration
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("world.xml");
![Page 18: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/18.jpg)
18
Configuring the Factory
The methods of the document-builder factory enable you to configure the properties of the document building
For example factory.setValidating(true) factory.setIgnoringComments(false)
Read more about DocumentBuilderFactory Class, DocumentBuilder Class
![Page 19: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/19.jpg)
19
The Node Interface The nodes of the DOM tree include
a special root (denoted document) The Document interface retrieved by builder.parse(…) actually
extends the Node Interface element nodes text nodes and CDATA sections attributes comments and more ...
Every node in the DOM tree implements the Node interface
![Page 20: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/20.jpg)
Interfaces in a DOM TreeDocumentFragment
Document
CharacterDataText
Comment
CDATASection
Attr
Element
DocumentType
Notation
Entity
EntityReference
ProcessingInstruction
Node
NodeList
NamedNodeMap
DocumentType
Figure as appears in : “The XML Companion” - Neil Bradley
![Page 21: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/21.jpg)
21
Interfaces in the DOM Tree
Document
Document Type Element
Attribute Element ElementAttribute Text
ElementText Entity Reference TextText
TextComment
![Page 22: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/22.jpg)
22
Node Navigation
Every node has a specific location in tree
Node interface specifies methods for tree navigation Node getFirstChild();
Node getLastChild();
Node getNextSibling();
Node getPreviousSibling();
Node getParentNode();
NodeList getChildNodes();
NamedNodeMap getAttributes()
![Page 23: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/23.jpg)
23
Node Navigation (cont)
getFirstChild()
getPreviousSibling()
getChildNodes()
getNextSibling()
getLastChild()
getParentNode()
![Page 24: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/24.jpg)
24
Node Properties
Every node has a type a name a value attributes
The roles of these properties differ according to the node types
Nodes of different types implement different interfaces (that extend Node)
![Page 25: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/25.jpg)
25
InterfacenodeNamenodeValueattributes
Attrname of attributevalue of attributenull
CDATASection"#cdata-section"content of the Sectionnull
Comment"#comment"content of the commentnull
Document"#document"nullnull
DocumentFragment"#document-fragment"nullnull
DocumentTypedoc-type namenullnull
Elementtag namenullNodeMap
Entityentity namenullnull
EntityReferencename of entity referencednullnull
Notationnotation namenullnull
ProcessingInstructiontargetentire contentnull
Text"#text"content of the text nodenull
Names, Values and Attributes
![Page 26: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/26.jpg)
26
Node Types - getNodeType()
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node …}
Read more about Node Interface
![Page 27: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/27.jpg)
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class EchoWithDom {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(“world.xml");
new EchoWithDom().echo(doc);
}
![Page 28: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/28.jpg)
private void echo(Node n) {
print(n);
if (n.getNodeType() == Node.ELEMENT_NODE) {
NamedNodeMap atts = n.getAttributes();
++depth;
for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i));
--depth; }
depth++;
for (Node child = n.getFirstChild(); child != null;
child = child.getNextSibling()) echo(child);
depth--;
}
![Page 29: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/29.jpg)
private int depth = 0;
private String[] NODE_TYPES = {
"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA",
"ENTITY_REF", "ENTITY", "PROCESSING_INST",
"COMMENT", "DOCUMENT", "DOCUMENT_TYPE",
"DOCUMENT_FRAG", "NOTATION" };
private void print(Node n) {
for (int i = 0; i < depth; i++) System.out.print(" ");
System.out.print(NODE_TYPES[n.getNodeType()] + ":");
System.out.print("Name: "+ n.getNodeName());
System.out.print(" Value: "+ n.getNodeValue()+"\n");
}}
![Page 30: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/30.jpg)
30
public class WorldParser {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder =
factory.newDocumentBuilder();
Document doc = builder.parse("world.xml");
printCities(doc);
}
Another Example
![Page 31: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/31.jpg)
31
public static void printCities(Document doc) {
NodeList cities = doc.getElementsByTagName("city");
for(int i=0; i<cities.getLength(); ++i) {
printCity((Element)cities.item(i));
}
}
public static void printCity(Element city) {
Node nameNode =
city.getElementsByTagName("name").item(0);
String cName = nameNode.getFirstChild().getNodeValue();
System.out.println("Found City: " + cName);
}
Another Example (cont)
Searches within descendents
![Page 32: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/32.jpg)
32
Normalizing the DOM Tree
Normalizing a DOM Tree has two effects: Combine adjacent textual nodes Eliminate empty textual nodes
To normalize, apply the normalize() method to the document element
Created by node
manipulation…
![Page 33: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/33.jpg)
33
Node Manipulation
Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc.
To constructs new nodes, use the methods of Document createElement, createAttribute, createTextNode,
createCDATASection etc. To manipulate a node, use the methods of Node
appendChild, insertBefore, removeChild, replaceChild, setNodeValue, cloneNode(boolean deep) etc.
![Page 34: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/34.jpg)
34
Node Manipulation (cont)
Old
New
replaceChild
cloneNode
deep = 'false'
deep = 'true'
Figure as appears in “The XML Companion” - Neil Bradley
![Page 35: 1 Processing XML with Java CS 236607, Spring 2008/9](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d585503460f94a381b7/html5/thumbnails/35.jpg)
35
Resources used for this presentation
The Hebrew University of Jerusalem – CS Faculty.
An Introduction to XML and Web Technologies – Course’s Literature.