no more pain for xml’s gain xj: facilitating xml processing in java
DESCRIPTION
No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java. Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005. The basic premise. XML is getting increasingly popular - PowerPoint PPT PresentationTRANSCRIPT
No More Pain for XML’s Gain
XJ: Facilitating XML Processing in Java
Matthew HarrenMukund RaghavachariOded ShmueliMichael Burke Rajesh BordawekarIgor PechtchanskiVivek Sarke
Itay Maman236826 Seminar lecture, 15 June 2005
2
The basic premise
• XML is getting increasingly popular• XML manipulation is now a common programming
task• The lead question:
– Do modern OO languages sufficiently support XML ?
3
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
Introduction: Schema file(file: technioncatalog.xsd)
4
Desired Output...
Introduction: XML document(file: short.xml)
<?xml version="1.0" encoding="UTF-8"?><catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course></catalog>
“Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”
5
Introduction: The XJ program
import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }
6
public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); }
XPath is a plain string. It may be:•Syntactically incorrect•Incompatible with the document
The types of the XML objects
(Node, Document) do not reflect the schema
Traditional XML processing: (DOM, XPath apis)
7
private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points");}
Assumption: Four child nodes must exist
Assumption: 3rd child is the course number
• These assumptions will not hold if the schema is changed– => run-time errors– problems remain, even if we identify nodes by name
• Possible Schema changes:– Allowing a new optional <students> sub-element– Changing the order of the sub-elements
What about reading the numeric value of an element?
Traditional XML processing(DOM apis)
Assumption: 2nd child has no child elements
8
No easy solution
• Similar problems occur when:1. XML elements are created by the program
2. Other libraries are used for reading/writing XML documents– Such as: Xalan, SAX
3. The developer wraps several complex operations within a single function/method/class
• These are inherent problems of the language
9
Shaping the future
• What XML-related facilities do we want?– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques
• XML notation • Java’s object creation syntax
– Two decomposition techniques
• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
10
Has the future arrived yet?
• Significant effort in integration of XML into modern programming language
– XJ– Scala– Cω– XTatic– …
• We will overview the constructs offered by XJ– A super-set of Java– Available at: http://www.research.ibm.com/xj
11
XJ’s Type system
12
XJ’s Type system
• Hierarchy of classes– A common root class: XMLObject – Automatic import: package com.ibm.xj.*
• Genericity: Sequence<T>, XMLCursor<T>– XMLCursor<T> is a Sequence<T> iterator
13
Integration with Schema
• The rationale: 1. An OO program is a collection of class definitions
2. A Schema file is a collection of type definitions
• => let’s integrate these definitions
• Any Schema is also an XJ types– The XJ compiler generates a “logical class” for
each such type– Schema file == package name– Using a schema == import schema_file_name;
14
import technioncatalog.*;
public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points>
<number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); }
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>); } }
XML literal in XJ code• Invalid XML content triggers a compile-time error• Resulting elements are typed!• Curly braces allow “escaping” back into XJ
15
... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c);
XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x);...
private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); }
An ill-typed program
Wrong <course> element
An XMLObject cannot be passed as a course element
16
Embedding XPath Queries in XJ
• Syntax: XmlValue [| XPathQuery |]
• Requires: a context-provider: – An XML element over which the XPath query is invoked
– (see the cat variable in the sample)
• Escaping: use a ‘$’ prefix
course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |];}
17
• Problem: resulting type is sometimes not so clear• Two options
– Sequence<T>• If the compiler determines that all result elements are
of type T– Sequence<XMLObject>
• (Otherwise)
• Automatic conversion from a singleton sequence
• Static check of XPath queries– If result is always empty => compile-time error– (The compiler cannot catch all cases)
XPath Semantics
18
Implicit coercions
• An atomic XML value can be seamlesslyconverted into a corresponding Java value
– xsd:double => double– xsd:boolean => boolean– xsd:string => java.lang.String– …
• This reduces the verbosity of XML-related code:
import technioncatalog.*;import technioncatalog.catalog.*;
public static String getTeacher(course c) { return c [| /teacher |]; }
Sequence<teacher> ► teacher ► String
19
Updates: Assignment to Query Result
• An XPath expression returns a reference to an existing element
– (No copying is involved)– Consistent with Java’s semantics for objects
• Thus, it can be assigned to – An XPath expression is a legal lvalue
• Bulk assignment– Occurs when the XPath expression denotes a sequence– Bulk assignment operator := allows multiple assignments– Double the credit points of each course:
public static void changePoint(catalog.course c, int p) {
c [| /points |] = p;}
cat [| //points |] *:= 2;
20
Tree structure update
• Class XMLObject also defines methods, such as:– insertAfter()– insertBefore()– insertAsFirst()– detach()
public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c);}
Which object is being modified?
21
Problems: Type Consistency
• Definitions1. An XML update operation, u, is a mapping over XML values
• u: T1 -> T2
2. An update is consistent if T1 = T2
• Ideally, a compile-time error should be triggered for each inconsistent update in the program
• Unfortunately, this cannot be promised
• The solution: Additional run-time check
Can you think of an example ?
Why do we want the two types to be equal?
22
Problems: Covariant subtyping (1/2)
• Covariance: change of type in signature is in the same direction as that of the inheritance
class X { }class A { public void m(X x) { } }
Class X1 extends X { }Class A1 extends A { public void m(X1 x) { } }...A a = new A1(); a.m(new X());
A1.m() is “spoiled”: Requires
only X1 objects
• Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding
– Same approach is taken by C++, C#
• But, covariance is allowed for arrays– Array assignments may fail at run-time
Which method should be invoked: A.m() or
A1.m() ?
23
Problems: Covariant subtyping (2/2)
(Now let us get back to our technioncatalog schema…)
• A <course> value is also spoiled – It requires unique children: <points>, <name>, etc.
• But, it also has an unspoiled super-class: XMLObject– All updates to XMLObject are legal at compile-time
• The following code compiles successfully:
public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); }
Run-time error is here !!
24
• Language constructs seen so far
– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques
• XML notation • Java’s object creation syntax
– Two decomposition techniques
• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
Shaping the future (revisited)
25
XPath expression as first-class-values
• What is a first-class-value?– A value that can be used “naturally” in the program
• Passed as an argument• Stored in a variable/field• Returned from a method• Created
• In XJ, XPath expression do not met these conditions– The main obstacle: The XPath part of the expression cannot
be separated from its context provider
26
XPath expression as first-class-values(cont’d)
• Let’s speculate on XPath as an FCV…• (Following code IS NOT a legal XJ program)
private static Sequence<teacher> teachers;
static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c);}
static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] );}
27
XPath expression as first-class-values(cont’d)
• Operators on XPath values– Composition– Conjunction– Disjunction
• These operators will allow the developer to easily create a rich array of safe XPath values
• The compiler must keep track of the type of each such value
– Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject
– When two XPath values are composed, the result type is deduced from the types of the operands
28
import Data._; // import generated definitionsimport scala.xml._; // for creating PCDATA nodes
object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }
Scala: Composition of XML elements
• In Scala, types can be defined in a DTD file– A DTD can be translated into Scala classes via the
dtd2scala utility
• Scala offers two options for composition of XML elements:
– Using XML notation (similar to XJ)– Using case-class construction notation:
29
Typed, named methods/fields
• Usually, values aggregated by a Java object are accessed by fields/methods
– Can we access XML sub-elements this way?– (Following code IS NOT a legal XJ program)
import technioncatalog.*;void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); }}
30
Typed, named methods/fields(cont’d)
• Some of the difficulties:– Sub-elements are not always named– Schema supports optional types: <xsd:choice>
• How can Java express an “optional” field?
• Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types
– Missing features: virtual fields, inheritance without polymorphism
– Other features can be found in Functional languages• E.g.: Variant types, immutability, structural conformance• But, their popularity lags behind
31
Summary
• XJ is a Java extension that has built in support for XML
– Type safety: Many things are checked at compile time
– Ease of use
• OO languages are not powerful enough (in terms of typing)
– Some type information is lost in the transition Schema -> Java
32
-The End-