josh lubell, lubell@nist
DESCRIPTION
Josh Lubell, [email protected] National Institute of Standards and Technology Manufacturing Systems Integration Division. A Tool Kit for Implementing XML Schema Naming and Design Rules OASIS Symposium: The Meaning of Interoperability May 9, 2006. XML Exchange Schemas are Bridges. - PowerPoint PPT PresentationTRANSCRIPT
A Tool Kit for Implementing XML Schema Naming and Design Rules
OASIS Symposium: The Meaning of
InteroperabilityMay 9, 2006
Josh Lubell, [email protected] Institute of Standards and TechnologyManufacturing Systems Integration Division
XML Exchange Schemas are Bridges
But Bridges Must Be Designed Properly
A Solution: Naming and Design Rules Encode XML schema best practices Enforce a particular modeling
methodology Ensure common naming conventions
Use of camel case Allowable acronyms …
But NDRs can be difficult to apply
Barriers to NDR Usefulness Proliferation
How do I decide which NDR set to adopt? Should I develop my own NDR?
Lack of structure NDR documents usually in proprietary word
processor formats Inhibits rule reuse Limited versioning and traceability
Ambiguity Rules written in English rather than computer-
interpretable language NDR enforcement not automatic
Schematron as an NDR Implementation Method Advantages
XML-native (based on XPath) Rule-based Can test for co-occurrence constraints User-configurable diagnostic messages ISO standard
Disadvantage Less versatile than a general purpose
programming language
Example from Universal Business Language NDR[ELD1] Each UBL:DocumentSchema MUST identify one and only one global element declaration that defines the document ccts:AggregateBusinessInformationEntity being conveyed in the Schema expression. That global element MUST include an xsd:annotation child element which MUST further contain an xsd:documentation child element that declares “This element MUST be conveyed as the root element in any instance document based on this Schema expression.”
Implementation Observations[ELD1] Each UBL:DocumentSchema MUST identify one and only one global element declaration that defines the document ccts:AggregateBusinessInformationEntity being conveyed in the Schema expression. That global element MUST include an xsd:annotation child element which MUST further contain an xsd:documentation child element that declares “This element MUST be conveyed as the root element in any instance document based on this Schema expression.”
Rule label
Namespace dependence
Namespace dependence
Subrule 1Context 1
Subrule 2
Context 2
UBL Lessons Learned Implementation non-trivial even for a seemingly simple
rule Some rules require a general purpose programming
language for implementation [GNR1] UBL XML element, attribute and type names MUST be
in the English language, using the primary English spellings provided in the Oxford English Dictionary.
[GNR7] UBL XML element, attribute and type names MUST be in singular form unless the concept itself is plural.
Some rules cannot be implemented at all [NMS6] UBL published namespaces MUST never be changed. [VER10] UBL Schema and schema module minor version
changes MUST not break semantic compatibility with prior versions.
MUST versus SHOULD versus MAY More on MAY later…
Dept. of Navy (DON) NDR Case Study 128 rules Based on UBL NDR Why choose the DON NDR?
Help developers write better schemas for Federal government applications
Gain insight into best practices for NDR development (particularly reuse of existing NDRs)
Publicly available A Navy standard
DON NDR Testability (using Schematron)
Issue: Use of MAY A rule saying that something MAY occur,
strictly speaking, will always pass But this may not be the rule creator’s intent
Example: [CTD8] Code and ID ccts:BBIE Property complex types MAY use the xsd:choice element to reference global elements defined in standardized ID Scheme or Code List Schema modules.
Approaches Consider rule as guidance only (don’t implement) Interpret MAY as discouragement, e.g. “warning:
referencing global element using xsd:choice”
Issue: Requirement for External Resources
[GNR1] UBL XML element, attribute and type names MUST be in the English language, using the primary English spellings provided in the Oxford English Dictionary.
Implementation requires access to electronic OED And the DON adaptation of this rule has additional
requirements:[GNR1] XML element, attribute, and type names MUST be
in the English language, using the Oxford English Dictionary for Writers and Editors (Latest Ed.). Where both American and English spellings of the same word are provided, the American spelling MUST be used.
Electronic OED must be fully up to date
Illustrated by UBL rule GNR1 versus DON rule GNR1
DON rule same as UBL rule, but with added contraints
American spelling favored Latest OED edition required
But no explicit relationship specified in DON NDR! Both rules have same ID, even though they are
different rules Improved traceability and reusability would reduce
the confusion
Issue: Rule Proliferation
Issue: Ambiguous Terminology
More rigor needed in NDR definitions Example: “xsd:SchemaExpression”
Not defined in W3C XML Schema recommendation
Used but not defined in DON NDR Defined in UBL NDR to mean “a
concept”
Issue: Mixed Content Essential for representing semi-structured
data But allowing it makes the NDR more
complicated UBL NDR forbids mixed content DON NDR allows it, but only if defined by a
namespace from a Navy-approved standard (e.g. XHTML)
But XHTML element and attribute names violate rule GNR1!
Quality of Design (QoD) Tool• Contains rules based on naming and design
guidelines (NDRs) from a number of sources• Stores executable test cases written in
Schematron and Java Expert System Shell (Jess)
• Executes tests against user-provided schemas and reports results
• Rules grouped into test profiles
Why QoD? Addresses proliferation of NDRs
Overlapping NDR standards Supports reusability of rules
Highlights ambiguous rules Provides an explicit structure for
rules in NDRs Automates rule enforcement Enables versioning and traceability
of rules
Characteristics of Rules Coverage: full, partial, none Applicability: indicates type of schema
(document, low, or aggregate) the rule applies to
Rationale: reason for rule from a list of justifications
Requirement: text from the NDR document
Implementation File: URI of the file containing the implementation of the rule
Example XML Description of a Ruleusing QoD Exchange Schema<testProfile> <source id="ubl"> <organization>OASIS</organization> <orgURL>http://www.oasis-open.org</orgURL> <title>Universal Business Language (UBL) Naming and Design Rules</title> <version>1.0</version> <date>2004-11-15</date> <docURL>http://docs.oasis-open.org/ubl/cd-UBL-NDR-1.0.1</docURL> </source> <ruleSet id="ELD"> <name>Element Declaration Rules</name> <rule id="ELD1"> <coverage>full</coverage> <schema>D</schema> <rationale>structural clarity</rationale> <requirement>Each UBL:DocumentSchema MUST identify one and ... </requirement> <implementation file="example.scmt#eld1" type="schematron"/> </rule> ... </testProfile>
QoD Test Profile Exchange
Application to Developing XML Schemas Currently a limited set of rules are implemented Recently implemented subset of DON NDR in
Schematron Tested with a small but varied set of sample
schemas Navy – IETM Schema Q70:IETM (Interactive Electronic
Technical Manual) Grants.gov AEX (building and construction industry) US Dept. of Defense
Provided meaningful results to schema developers
Examples of types of warnings found in developing XML Schemas
Global elements declared in non-desirable places
Anonymous/local types defined in non-desirable places
“Global” schemas that do not declare a default namespace
Document/Transaction level schemas that define multiple global elements
Re-declaration of elements and types (e.g. programType) in different namespaces
Lesson Learned in coding NDRs NDR documents need to be regarded as rigorous
technical documentation More review needed Better authoring tools needed
Rules that cannot be implemented are non-enforceable
Definition of NDRs is non-trivial Many rules cannot be tested Many rules are more difficult to implement than thought Difficult to reuse rules due to namespace definitions Often rules are ambiguous or unclear
Implementation of rules is non-trivial Testing of rules is complex All boundary conditions need to be thought of and covered
Legacy data and 3rd party schemas need to be addressed in NDRs
What’s Next Continue to expand our NDR rule-base Continue to enhance software based on
user requirements Produce a tool kit for NDR developers
Enhance QoD schema to represent entire NDR document
Provide authoring templates Identify collaborators for future work
If interested, contact me!
Summary A process for XML schema development
is necessary Tools can automate the process, thereby
reducing labor and deployment time Definition and implementation of NDRs is
non-trivial but necessary to support reuse of schemas
Enforcing NDRs will ultimately make XML schemas more interoperable