no more pain for xml’s gain xj: facilitating xml processing in java matthew harren mukund...

Post on 21-Dec-2015






Click to see full reader


No More Pain for XML’s Gain

XJ: Facilitating XML Processing in Java

Matthew HarrenMukund RaghavachariOded ShmueliMichael Burke Rajesh BordawekarIgor PechtchanskiVivek Sarke

Itay Maman236826 Seminar lecture, 15 June 2005


The basic premise

• XML is getting increasingly popular• XML manipulation is now a common programming

task• The lead question:

– Do modern OO languages sufficiently support XML ?


<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs=""><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Introduction: Schema file(file: technioncatalog.xsd)


Desired Output...

Introduction: XML document(file: short.xml)

<?xml version="1.0" encoding="UTF-8"?><catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course></catalog>

“Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”


Introduction: The XJ program

import*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }


public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new"short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); }

XPath is a plain string. It may be:•Syntactically incorrect•Incompatible with the document

The types of the XML objects

(Node, Document) do not reflect the schema

Traditional XML processing: (DOM, XPath apis)


private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points");}

Assumption: Four child nodes must exist

Assumption: 3rd child is the course number

• These assumptions will not hold if the schema is changed– => run-time errors– problems remain, even if we identify nodes by name

• Possible Schema changes:– Allowing a new optional <students> sub-element– Changing the order of the sub-elements

What about reading the numeric value of an element?

Traditional XML processing(DOM apis)

Assumption: 2nd child has no child elements


No easy solution

• Similar problems occur when:1. XML elements are created by the program

2. Other libraries are used for reading/writing XML documents– Such as: Xalan, SAX

3. The developer wraps several complex operations within a single function/method/class

• These are inherent problems of the language


Shaping the future

• What XML-related facilities do we want?– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques

• XML notation • Java’s object creation syntax

– Two decomposition techniques

• Typed XPath • Typed, named methods/fields

– XPath expressions as first-class-values


Has the future arrived yet?

• Significant effort in integration of XML into modern programming language

– XJ– Scala– Cω– XTatic– …

• We will overview the constructs offered by XJ– A super-set of Java– Available at:


XJ’s Type system


XJ’s Type system

• Hierarchy of classes– A common root class: XMLObject – Automatic import: package*

• Genericity: Sequence<T>, XMLCursor<T>– XMLCursor<T> is a Sequence<T> iterator


Integration with Schema

• The rationale: 1. An OO program is a collection of class definitions

2. A Schema file is a collection of type definitions

• => let’s integrate these definitions

• Any Schema is also an XJ types– The XJ compiler generates a “logical class” for

each such type– Schema file == package name– Using a schema == import schema_file_name;


import technioncatalog.*;

public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points>

<number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); }

private static catalog buildCatalog(catalog.course c) {

return new catalog(<catalog>{c}</catalog>); } }

XML literal in XJ code• Invalid XML content triggers a compile-time error• Resulting elements are typed!• Curly braces allow “escaping” back into XJ


... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c);

XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x);...

private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); }

An ill-typed program

Wrong <course> element

An XMLObject cannot be passed as a course element


Embedding XPath Queries in XJ

• Syntax: XmlValue [| XPathQuery |]

• Requires: a context-provider: – An XML element over which the XPath query is invoked

– (see the cat variable in the sample)

• Escaping: use a ‘$’ prefix

course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |];}


• Problem: resulting type is sometimes not so clear• Two options

– Sequence<T>• If the compiler determines that all result elements are

of type T– Sequence<XMLObject>

• (Otherwise)

• Automatic conversion from a singleton sequence

• Static check of XPath queries– If result is always empty => compile-time error– (The compiler cannot catch all cases)

XPath Semantics


Implicit coercions

• An atomic XML value can be seamlesslyconverted into a corresponding Java value

– xsd:double => double– xsd:boolean => boolean– xsd:string => java.lang.String– …

• This reduces the verbosity of XML-related code:

import technioncatalog.*;import technioncatalog.catalog.*;

public static String getTeacher(course c) { return c [| /teacher |]; }

Sequence<teacher> ► teacher ► String


Updates: Assignment to Query Result

• An XPath expression returns a reference to an existing element

– (No copying is involved)– Consistent with Java’s semantics for objects

• Thus, it can be assigned to – An XPath expression is a legal lvalue

• Bulk assignment– Occurs when the XPath expression denotes a sequence– Bulk assignment operator := allows multiple assignments– Double the credit points of each course:

public static void changePoint(catalog.course c, int p) {

c [| /points |] = p;}

cat [| //points |] *:= 2;


Tree structure update

• Class XMLObject also defines methods, such as:– insertAfter()– insertBefore()– insertAsFirst()– detach()

public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c);}

Which object is being modified?


Problems: Type Consistency

• Definitions1. An XML update operation, u, is a mapping over XML values

• u: T1 -> T2

2. An update is consistent if T1 = T2

• Ideally, a compile-time error should be triggered for each inconsistent update in the program

• Unfortunately, this cannot be promised

• The solution: Additional run-time check

Can you think of an example ?

Why do we want the two types to be equal?


Problems: Covariant subtyping (1/2)

• Covariance: change of type in signature is in the same direction as that of the inheritance

class X { }class A { public void m(X x) { } }

Class X1 extends X { }Class A1 extends A { public void m(X1 x) { } }...A a = new A1(); a.m(new X());

A1.m() is “spoiled”: Requires

only X1 objects

• Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding

– Same approach is taken by C++, C#

• But, covariance is allowed for arrays– Array assignments may fail at run-time

Which method should be invoked: A.m() or

A1.m() ?


Problems: Covariant subtyping (2/2)

(Now let us get back to our technioncatalog schema…)

• A <course> value is also spoiled – It requires unique children: <points>, <name>, etc.

• But, it also has an unspoiled super-class: XMLObject– All updates to XMLObject are legal at compile-time

• The following code compiles successfully:

public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); }

Run-time error is here !!


• Language constructs seen so far

– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques

• XML notation • Java’s object creation syntax

– Two decomposition techniques

• Typed XPath • Typed, named methods/fields

– XPath expressions as first-class-values

Shaping the future (revisited)


XPath expression as first-class-values

• What is a first-class-value?– A value that can be used “naturally” in the program

• Passed as an argument• Stored in a variable/field• Returned from a method• Created

• In XJ, XPath expression do not met these conditions– The main obstacle: The XPath part of the expression cannot

be separated from its context provider


XPath expression as first-class-values(cont’d)

• Let’s speculate on XPath as an FCV…• (Following code IS NOT a legal XJ program)

private static Sequence<teacher> teachers;

static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c);}

static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] );}


XPath expression as first-class-values(cont’d)

• Operators on XPath values– Composition– Conjunction– Disjunction

• These operators will allow the developer to easily create a rich array of safe XPath values

• The compiler must keep track of the type of each such value

– Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject

– When two XPath values are composed, the result type is deduced from the types of the operands


import Data._; // import generated definitionsimport scala.xml._; // for creating PCDATA nodes

object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }

Scala: Composition of XML elements

• In Scala, types can be defined in a DTD file– A DTD can be translated into Scala classes via the

dtd2scala utility

• Scala offers two options for composition of XML elements:

– Using XML notation (similar to XJ)– Using case-class construction notation:


Typed, named methods/fields

• Usually, values aggregated by a Java object are accessed by fields/methods

– Can we access XML sub-elements this way?– (Following code IS NOT a legal XJ program)

import technioncatalog.*;void printTeachers(catalog cat) { for(int i = 0; i <; ++i) { catalog.course c =[i]; System.out.println(c.teacher); }}


Typed, named methods/fields(cont’d)

• Some of the difficulties:– Sub-elements are not always named– Schema supports optional types: <xsd:choice>

• How can Java express an “optional” field?

• Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types

– Missing features: virtual fields, inheritance without polymorphism

– Other features can be found in Functional languages• E.g.: Variant types, immutability, structural conformance• But, their popularity lags behind



• XJ is a Java extension that has built in support for XML

– Type safety: Many things are checked at compile time

– Ease of use

• OO languages are not powerful enough (in terms of typing)

– Some type information is lost in the transition Schema -> Java


-The End-

top related