IBM Research
© 2005 IBM Corporation
Adventures in Formal Methods at W3C:The XQuery 1.0 and XPath 2.0 Formal Semantics
Jerome Simeon, IBM
2 Scalable XML Infrastructure | © 2005 IBM Corporation
A Formal Semantics is Controversial Famous computer scientist #1:
<< The XQuery Formal Semantics is unreadable, and you should just use english. >>
Famous computer scientist #2:
<< I couldn’t understand how XQuery works until I read its formal semantics >>
3 Scalable XML Infrastructure | © 2005 IBM Corporation
A Formal Semantics is Useful XQuery 1.0 requirements: A language and an algebra
– Relational algebra was a success: simple core design, basis for optimization, countless research advances over 30 years of history
Support for Static Analysis over XQuery 1.0:– Static Typing Feature calculate type of expressions at compile time
– Soundness theorem: every XML data processed by an expression will have the type infered (or a subtype)
Deeper understanding of XQuery 1.0 semantics (translation effect):– Numerous bugs in english specification found (notably related to types)
– Exposes unnecessary complexity and opportunities for consolidation
Build the language on solid foundations to facilitate future evolutions
4 Scalable XML Infrastructure | © 2005 IBM Corporation
XQuery 1.0 and XPath 2.0 Formal Semantics
Formalization using inference rules notations
Notations developed for specifying programming languages
5 Scalable XML Infrastructure | © 2005 IBM Corporation
XQuery Processing Model
Formalized & normative:– Normalization of expressions into
a smaller “core” grammar (SQ5)
– Static typing (SQ6)
Formalized & non-normative:– Dynamic evalution (DQ1)
Formalized & informative:– Schema import (SI1)
6 Scalable XML Infrastructure | © 2005 IBM Corporation
English vs. Formal Specification
7 Scalable XML Infrastructure | © 2005 IBM Corporation
Formal Specification: One Example
Read:– To statically type a variable reference (e.g., $foo:x)
– Resolve the foo:x name to it expanded name (e.g., {http://example.com}:x)
– Lookup the type (Type) bound for that expanded name in the static environment (statEnv)
– if the rule cannot be applied (e.g., variable not in the environment), the inference fails and an error is raised.
8 Scalable XML Infrastructure | © 2005 IBM Corporation
Formal Specification as an Implementation Guide
9 Scalable XML Infrastructure | © 2005 IBM Corporation
Lessons Learned A formal specification is a large investment:
– 500+ pages for XQuery 1.0 and XPath 2.0
– Benefits must be clearly understood and resources are needed
Important to know what not to formalize:– In XQuery: Namespace resolution, validate expression (see XML Schema)…
Audience is not everyone:– Reader must invest in understanding the notations
– Useful for the working group, for developers, for academics…
– Not so useful for most end-users
Benefits are numerous, during the specification, and after:– E.g., fuels growing research activities around XQuery