© 2006 open grid forum data format description language (dfdl) progress update mike beckerle, oco,...
TRANSCRIPT
© 2006 Open Grid Forum
Data Format Description Language (DFDL)
Progress Update
Mike Beckerle, Oco, Inc.
© 2006 Open Grid Forum 2
Admin
• DFDL WG co-chairs:• Mike Beckerle, Oco• Martin Westhead, Avaya
• Two note takers please
• Sign the attendance sheet
• Note: OGF Intellectual Property Rules apply
© 2006 Open Grid Forum 3
OGF IPR Policies Apply• “I acknowledge that participation in this meeting is subject to the OGF Intellectual Property
Policy.”• Intellectual Property Notices Note Well: All statements related to the activities of the OGF
and addressed to the OGF are subject to all provisions of Appendix B of GFD-C.1, which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to:
• the OGF plenary session, • any OGF working group or portion thereof, • the OGF Board of Directors, the GFSG, or any member thereof on behalf of the OGF, • the ADCOM, or any member thereof on behalf of the ADCOM, • any OGF mailing list, including any group list, or any other list functioning under OGF auspices, • the OGF Editor or the document authoring and review process
• Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions.
• Excerpt from Appendix B of GFD-C.1: ”Where the OGF knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.”
• OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.
© 2006 Open Grid Forum 4
Agenda
• What is DFDL?• Provide enough context for those
interested in getting involved but who haven't been following along
• Progress review• Latest on the draft specification
• Next steps• Participation needed to complete the
specification, initial implementations
© 2006 Open Grid Forum 5
Agenda
• What is DFDL?• Provide enough context for those
interested in getting involved but who haven't been following along
• Progress review• Latest on the draft specification, current
implementations of DFDL processors
• Next steps• Participation needed to complete the
specification, initial implementations
© 2006 Open Grid Forum 6
Data Interchange Formats
There are two kinds:
• Prescriptive standards: “Put your data in this format!”• Textual - XML• Binary – ASN.1, XDR, EBML, …• You use the defined encodings, syntax, …• Descriptions are shareable
• Others: “Choose your own format”• Textual, binary, mixture • You use your own encodings, syntax, …• No common description DFDL: a universal description for any format
© 2006 Open Grid Forum 7
What is DFDL?
1. A way of describing data…• It is NOT a data format itself!
2. That can describe any data format…• Textual and binary • Commercial record-oriented • Scientific and numeric• Modern and legacy
3. While allowing high performance…• Choose the right data format for the job
• High density, optimized I/O, random access
• No need to use DFDL libraries
© 2006 Open Grid Forum 8
Benefits of DFDL
1. Data format independent2. Prescriptive standards not always best for job3. Descriptions can be shared
4. Interoperability 5. Allows after-the-event description6. Supports move to SOA 7. Spans commercial and scientific worlds8. Supports grid computing
© 2006 Open Grid Forum 9
Why Grids and DFDL?
• Grids are about big-data and big-computation problems• Simplistic solutions like “use XML” won’t cut it!• Performance and space usage
• Grids are about universal data interchange• XML – use XML Schema• Relational – use RDBMS schema• Other – use DFDL
© 2006 Open Grid Forum 10
DFDL Goals
• Build on existing standards• Leverage XML technology and concepts
• Support very efficient parsers/formatters• Support round-tripping
• Read and write data in described format from same description
• Keep simple cases simple• Simple descriptions should be "human readable"
• To the same degree as XML Schema • Generality
• Can describe any format • Allow extensions for new data formats
© 2006 Open Grid Forum 11
XML Synergy
• Use XML Schema subset & type system to describe the logical format of the data
• Use annotations within the XSD to describe the physical representation of the data
• Use XPath when referencing fields within the data
• This approach actively used today:• IBM WebSphere Message Broker• Microsoft BizTalk flat file• Others
© 2006 Open Grid Forum 12
DFDL FeaturesRelease 1:• Subset of XML Schema• Rich textual & binary data capabilities• Scoping rules that govern how the annotations apply• Validated input and output - from XML Schema• Defaults - for missing values• ‘Null’ capability - for out-of-band values• Reference – use of a previously read value in subsequent expressions• Basic math – in expressions• Uncertainty – stratagems to resolve choices, optionality• Arrays – one dimension• Very general parsing/writing capability
Future:• Multi-layer – description of an intermediate rep not exposed in final
result• Extensibility – new type/transform specification• Arrays - multi-dimension
© 2006 Open Grid Forum 13
DFDL Schema Model
• Simple Types• Represent data values
• Complex Types• Represent structures
• Elements• Represent named fields• Can be data fields or structural fields• Can repeat (arrays)
• Sequence groups• Structures where the children occur in order
• Choice groups• Structures where just one of the children occurs (eg: C union, COBOL redefine)
• Wildcards• Points where any element can occur
Missing – multi-dimensional arrays. Deferred to a future version.
© 2006 Open Grid Forum 14
DFDL Schema Model
type
dimension
element
simpleType
simple type hierarchy on later slide
sequence choice
group
*
wildcard
*
*complexType
© 2006 Open Grid Forum 15
XML Schema Subset
• Namespace management• xs:import/xs:include file management• Local and global xs:element declarations
• Optional dimensionality via maxOccurs and minOccurs• Optional default value• Optional nillable attribute (null value support)
• Local and global xs:complexType definitions• Most built-in xs:simpleTypes – see later slide• Local and global user-defined xs:simpleTypes by derivation• Local and global xs:sequence groups (no dimensionality)• Local and global xs:choice groups (no dimensionality)• xs:any element wildcards• xs:appinfo annotations to carry the DFDL information
© 2006 Open Grid Forum 16
XML Schema – Not Required
• Local and global xs:attribute declarations• xs:attribute groups• xs:complexType derivations• xs:union and xs:list simple types• Built-in xs:simpleTypes specific to XML or otherwise superfluous• Dimensionality on non-elements (that is, model groups)• Identity Constraints • Substitution Groups • xs:redefine file management• Local and global xs:all groups
© 2006 Open Grid Forum 17
Supported Simple Types
anySimpleType
string QName NOTATION float double decimal boolean base64Binary hexBinary anyURI
normalizedString
token
language Name NMTOKEN
NMTOKENSNCName
ID IDREF ENTITY
IDREFS ENTITIES
integer
long nonPositiveInteger nonNegativeInteger
negativeInteger positiveInteger unsignedLong
unsignedInt
unsignedShort
unsignedByte
int
short
byte
date time dateTime gYear gYearMonth gMonth gMonthDay gDay duration
© 2006 Open Grid Forum 18
What DFDL is Not: FAQ
• I have a pre-defined XML Schema.• Q: Can I use DFDL to populate it from a non-
XML data file?
• A: Only partly:• DFDL is focused on data format• DFDL does not provide general data transformation• Populating a pre-defined XML Schema involves two
separate problems:1. Using DFDL to describe the data file format2. Using a transformation system to transform that to conform
to the pre-defined schema (not DFDL's job)
© 2006 Open Grid Forum 19
DFDL is only about Format !
• The structure of the DFDL schema is dictated by the logical structure of the data
• You must work bottom up. • Start from the data format, not from
what you want to turn it into.
© 2006 Open Grid Forum 20
Example 1: XML
<w>5</w>
<x>7839372</x>
<y>8.6E-200</y>
<z>-7.1E8</z>
© 2006 Open Grid Forum 21
Example 1: XSD
<xs:complexType name=“example1”>
<xs:sequence>
<xs:element name="w" type=“int"/>
<xs:element name="x" type=“int"/>
<xs:element name="y" type=“double"/>
<xs:element name="z" type=“float"/>
</xs:sequence>
</xs:complexType>
© 2006 Open Grid Forum 22
Example 1: Binary
0000 0005 0077 9e8c
169a 54dd 0a1b 4a3f
ce29 46f6
© 2006 Open Grid Forum 23
Example 1: Binary DFDL<xs:complexType name="example1"> <xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format represtation="binaryInteger"
byteOrder="bigEndian” lengthKind=“fixed” length=“4”/>
</xs:appinfo></xs:annotation><xs:sequence>
<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“
dfdl:length=“8” /><xs:element name="z" type="float“/>
</xs:sequence></xs:complexType>
Long form
Short form
© 2006 Open Grid Forum 24
Example 1: Textual
5,7839372,baseQ=8.6E-200,irl1=-7.1E8
Separators, initiators, terminators are all
examples of markup
© 2006 Open Grid Forum 25
Example 1: Textual DFDL <xs:complexType name="example1"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format repType=“text” encoding=“UTF-8” numberScheme=“myDecimals”
lengthKind=“delimited” /> </xs:appinfo>
</xs:annotation><xs:sequence dfdl:separator="," >
<xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double" dfdl:initiator="baseQ="/>
<xs:element name="z" type="float" dfdl:initiator="irl1="/>
</xs:sequence></xs:complexType>
© 2006 Open Grid Forum 26
Long Form DFDL Annotations
<xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format separator=","/> </xs:appinfo> </xs:annotation> … …
<xs:element name="y" type="double"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format initiator="baseQ="/> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>
© 2006 Open Grid Forum 27
Short Form DFDL Annotations
<xs:sequence dfdl:separator=","> …
<xs:element name="y" type="double" dfdl:initiator="baseQ="/> … </xs:sequence>
Non-native attribute syntax,
easy for users to write
© 2006 Open Grid Forum 28
Element Form DFDL Annotations
Used for properties that themselves contain the quotation mark characters and escape characters
<xs:sequence>
<xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0">
<dfdl:property name='separator'>,</dfdl:property> </xs:appinfo> </xs:annotation> … …
<xs:element name="y" type="double"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0">
<dfdl:property name='initiator'>baseQ=</dfdl:property> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>
© 2006 Open Grid Forum 29
Long Form – Construct Specific
<xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=","/> </xs:appinfo> </xs:annotation>
… <xs:element name="y" type="double">
<xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element initiator="baseQ="/> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>
Annotation on xs:element can be dfdl:element.
Restricts properties to those sensible for
elements.
Annotation on xs:sequence can be dfdl:sequence.
Restricts properties to those sensible for sequences
© 2006 Open Grid Forum 30
Annotation Elements
format Defines the data format properties that apply to part of the logical data models when using long form. (Also the construct specfic variations).
defineFormat Defines a reusable data format by collecting together other annotations and associating them with a name that can be referenced from elsewhere.
assert Defines truths about a DFDL model when parsing the data.
discriminator Defines an expressions to use for predicate testing to resolve uncertainties such as choice branches
numberFormat Defines the scheme used for expressing numbers in the logical data model. This includes aspects such as radix, field separators, digit grouping separators, leading or trailing signs, etc.
escapeScheme Defines the scheme by which quotation marks and escape characters can be specified. This is for use with delimited text formats.
property Used in the syntax of dfdl:format annotations when the property value is incompatible with XML attribute content.
defineVariable Defines a variable that can be referenced elsewhere. This can be used to communicate a parameter from an earlier part of a schema to a later part.
setVariable Sets the value of a variable that has been defined earlier in the schema.
hidden Defines a hidden element that appears in the schema for use by the DFDL processor, but is not part of the logical data model described by the schema.
typeSubstitution Defines overload of a non-annotated types with annotated meanings
© 2006 Open Grid Forum 31
Agenda
• What is DFDL?• Provide enough context for those
interested in getting involved but who haven't been following along
• Progress review• Latest on the draft specification, current
implementations of DFDL processors
• Next steps• Participation needed to complete the
specification, further implementations
© 2006 Open Grid Forum 32
Current Specification • Core is currently at version 031
• Supplements for• Advanced decimals• Datetimes• Advanced delimited formats (may go away)• Bi-directional text
• May be found at• https://forge.gridforum.org/sf/go/doc15032?nav=1
© 2006 Open Grid Forum
Core Specification 031
DFDL Information Set (Infoset)
DFDL Schema Model
Syntax Basics
Syntax and Usage of DFDL Annotation Elements
Expression language
Scoping Rules
DFDL Properties Introduction: The DFDL Parser and Unparser
DFDL Data Syntax Grammar
Framing
Simple Types
Properties for Nullable Elements
Sequence Groups
Unordered Sequence Groups
Assertion and Discriminator Evaluation
Choices
Arrays and Optional Elements: Repeating and Variable-Occurrence Data Items
Calculated Value Properties
Any Element Wildcard
33
© 2006 Open Grid Forum
Highlights
• Infoset• Expression Language• Calculated Values
34
© 2006 Open Grid Forum
DFDL Infoset
• Abstract information model• Simplifies explanation of
• Parsing of an invalid instance• Unparsing of data
• That data might be constructed programatically, not necessarily from parsing data
• Clarifies data model relationship to XML data
• DFDL Schema is implied• Unlike XML Infoset, there’s no DFDL Infoset
concept without a DFDL Schema
35
© 2006 Open Grid Forum
DFDL Infoset
• Simpler than XML Infoset• Information Items
• Document• Element
36
© 2006 Open Grid Forum
DFDL Infoset
37
© 2006 Open Grid Forum 38
DFDL Expression Language
• Small Subset of XPath 2.0.• Constrained usage• XPATH 2.0 large language• Data model is DFDL Infoset
• Syntax• { expression }
© 2006 Open Grid Forum 39
Expression Language Usage
• Set properties dynamically from instance• initiator, terminator, length, separator,
occursSeparator, null
• dfdl:assert, dfdl:discriminator • inputValueCalc, outputValueCalc • dfdl:setVariable
© 2006 Open Grid Forum 40
XPath 2.0 Subset
• XPath 2.0 types• Limited path expressions
• Must return single value• Limited axis
• XPath 1.0 functions• Variables
© 2006 Open Grid Forum 41
Path Expressions
• / axis :: nodetest [ predicate ] /• Axis
• child, parent (..) , self (.)
• Nodetest• Nametest - Name or wildcard
• [ predicate ] • Must return array index
© 2006 Open Grid Forum
Calculated Values
• DFDL supports limited layering • Properties:
• inputValueCalc• outputValueCalc• outputLengthCalc• Note: inputLengthCalc is just length=“{ … }”
42
© 2006 Open Grid Forum
Calculated Values Example
• Logically, the data is a date.<element name=“d” type=“date”/>
• Physically, it is stored as 6 packed-decimal digits, unsigned• Like so:
43
Y YD DM M
© 2006 Open Grid Forum
Physical Layer’s Shape
<sequence>
<element name="mm" type="int“ />
<element name="dd" type="int“ />
<element name="yy" type="int“ />
</sequence>
44
Y YD DM M
© 2006 Open Grid Forum
Physical Layer Representation
<sequence dfdl:representation="binary"
dfdl:numberRepresentation="packedDecimal"
dfdl:decimalSigned="false“ >
<element name="mm" type="int"
dfdl:numberPattern="##“ />
<element name="dd" type="int"
dfdl:numberPattern="##“ />
<element name="yy" type="int"
dfdl:numberPattern="##“ />
</sequence>
45
Y YD DM M
© 2006 Open Grid Forum
Hide Physical Layer<sequence>
<annotation><appinfo ...>
<dfdl:hidden>
<element name="pdate"> <complexType>
<sequence dfdl:representation="binary"
dfdl:numberRepresentation="packedDecimal"
dfdl:decimalSigned="false“ >
<element name="mm" type="int"
dfdl:numberPattern="##“ />
<element name="dd" type="int"
dfdl:numberPattern="##“ />
<element name="yy" type="int"
dfdl:numberPattern="##“ />
</sequence>
</complexType>
</element>
</dfdl:hidden>
</appinfo></annotation>
<element name="d" type="date">
…
</element>
46
Y YD DM M
Logical layer
Physical layer(hidden)
© 2006 Open Grid Forum
Parsing: Calculate Logical from Physical
<sequence>
… hidden pdate here …
<element name="d" type="date">
<annotation><appinfo>
<dfdl:format>
<dfdl:property name="inputValueCalc">
{
fn:date(fn:concat(if ( ../pdate/yy > 50 ) then "19" else "20",
fn:string(../pdate/yy),
"-",
fn:string(../pdate/mm),
"-",
fn:string(../pdate/dd)))
}
</dfdl:property>
</dfdl:format>
</appinfo></annotation>
</element>
47
Y YD DM M
dfdl:property annotations to avoid quoting hell
© 2006 Open Grid Forum
Unparsing: Calculate Physical from Logical
<sequence dfdl:representation="binary"
dfdl:numberRepresentation="packedDecimal"
dfdl:decimalSigned="false"
<element name="mm" type="int"
dfdl:numberPattern="##"
dfdl:outputValueCalc="{ fn:month-from-date(../d) }" />
<element name="dd" type="int"
dfdl:numberPattern="##"
dfdl:outputValueCalc="{ fn:day-from-date(../d) }" />
<element name="yy" type="int"
dfdl:numberPattern="##"
dfdl:outputValueCalc="{ fn:year-from-date(../d) mod 100 }" />
</sequence>
48
Y YD DM M
© 2006 Open Grid Forum 49
Agenda
• What is DFDL?• Provide enough context for those
interested in getting involved but who haven't been following along
• Progress review• Latest on the draft specification, current
implementations of DFDL processors
• Next steps• Participation needed
• Readers & editors
© 2006 Open Grid Forum
Where are we on this spec?
schema components (UML – 2 layer)
regular expressions for lengths (Alan)
valueCalc examples (Mike)
property precedence (Steve)
bring supplements up-to-date (Steve)
assertions/discriminators and choice (Suman)
how speculative parsing works (combining choice and variable-occurrence - currently these are separate) (IBM WTX person)
reordering the properties discussion (move representation earlier, improve flow of topics) (Alan)
50
We're Done
Start Over
escapeSchemes (Ian P)
string xml type (Ian P)
variables (Mike)
selectors (Suman)
improvements on property descriptions (All - split tbd)
envelopes and payloads (Steve)
Official OGF “Draft”
© 2006 Open Grid Forum
Where are we on this spec?
v032
v034
51
We're Done
Start Over
v033
Official OGF “Draft”
• 30 to 60 days between versions
• Official “draft” by roughly August
© 2006 Open Grid Forum 52
Completion of Specification
• Anybody willing to join the DFDL WG in an active capacity ?
• Anybody interested in reviewing the spec ?• To validate that R1 satisfies sufficient
requirements
© 2006 Open Grid Forum 53
New Implementations
• The more the merrier !
• Anybody interested in creating or collaborating on a reference implementation?
© 2006 Open Grid Forum 54
Requirements Discussion
• Q: How many people were planning to use DFDL to convert data into XML ?
• Q: How many people were not ?
© 2006 Open Grid Forum 55
END
© 2006 Open Grid Forum 56
Full Copyright Notice
Copyright (C) Open Grid Forum (2004, 2007). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works.
The limited permissions granted above are perpetual and will not be revoked by the OGF or its successors or assignees.
© 2006 Open Grid Forum
Data Format Description Language (DFDL)
Extra Information
© 2006 Open Grid Forum 58
Standards Efforts in This Area• ESML (Earth Science Markup Language) Users can write external files
using ESML schema to describe the structure of various earth science data formats. Applications can utilize the ESML Library to parse this description file and decode the data format.
• XSIL (Extensible Scientific Interchange Language ) The Extensible Scientific Interchange Language (XSIL) is a flexible, hierarchical, extensible, transport language for scientific data objects.
• BFD (Binary Format Description) language The Binary Format Description (BFD) language is an XML dialect based on the eXtensible Scientific Interchange Language(XSIL) that supports the executable documentation of 'arbitrary' binary and ascii data sets.
• BinX (Binary XML Description Language) Another language used to describe the content, structure and physical layout (endian-ness, blocksize) of binary files. BinX is designed to enable transparent transfer of data between diverse platforms.
© 2006 Open Grid Forum 59
DFDL Data Syntax Grammar
• Data in a format describable via a DFDL schema obeys the syntax grammar
• Data divided into • Content – used to compute logical value• Framing – padding, delimiters, separators,
etc
© 2006 Open Grid Forum 60
Elements
• Document = Element
• Element = Prefix ElementContent Suffix
• ElementContent = SimpleContent | ComplexContent
• ComplexContent = SequenceGroup | ChoiceGroup
• SimpleContent = NumberContent | StringContent |CalendarContent | BinaryContent | BooleanContent
© 2006 Open Grid Forum 61
SequenceGroup
• SequenceGroup = Prefix SequenceGroupContent Suffix
• SequenceGroupContent = [ SequenceGroupPrefix [ GroupItem [ SequenceGroupSeparator GrouptItem]*] SequenceGroupSuffix ]
• GroupItem = Element | Array | ComplexContent
© 2006 Open Grid Forum 62
Choice Group
• ChoiceGroup = Prefix ChoiceGroupContent Suffix
• ChoiceGroupContent = ChoiceGroupResolvableItem | ChoiceGroupUnresolvableItem
• ChoiceGroupResolvableItem = GroupItem
• ChoiceGroupUnresolvableItem = Opaque
© 2006 Open Grid Forum 63
Array
• Array = ArrayContent
• ArrayContent = [ OccursPrefix [ Element [ OccursSeparator Element ]* ] [ OccursSeparator StopValue ] OccursSuffix ]
• StopValue = SimpleContent
© 2006 Open Grid Forum 64
Nulls, Optional, Defaults
• Unparsing does not use defaults to make invalid logical tree valid. (“system” may provide validation)
• Required parse elements• Scalar (minOccurs=maxOccurs=1 or neither is specified )• Array of fixed occurances• Everything else is optional (minOccurs, maxOccurs only used for validation)
• If a required element is not found by speculative parsing then use default value.
• If an optional element is not found by speculative parsing then we’re at the end of the variable occurrences. No defaulting occurs
• NULLS• Decoupled from defaults• useNullValueForDefault • nullValueInitiatorPolicy
© 2006 Open Grid Forum 65
Scoping
<xs:sequence dfdl:separator="," dfdl:encoding="ebcdic-cp-us"> <xs:element name="w" type=“string”/> <xs:element name="x" type=“myType”/></xs:sequence>
<xs:complexType name=“myType"> <xs:sequence> … </xs:sequence></xs:complexType>
• Are the elements of the sequence in EBCDIC, or just the separators? (“lexical scoping”)
• Are the contents of type ‘myType' affected by the sequence’s properties or not? (“referential transparency”)
© 2006 Open Grid Forum 66
Scoping - Rules
• Scope of a dfdl:format annotation is controlled by a special “applies” attribute• “hereOnly” - applies to the annotated construct only
• “toScope” - applies to the annotated construct AND to any contained or referenced constructs
• A construct can have two dfdl:format annotations, one with “hereOnly” and one with “toScope”
• Most local “toScope” annotation wins
• “Referential transparency” maintained
• One exception – global elements and groups • Can override at point of reference
© 2006 Open Grid Forum 67
Scoping - Example 1<xs:complexType name="example1"> <xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format applies=“toScope”
representation="binaryInteger" byteOrder="bigEndian”
lengthKind=“fixed” length=“4”/>
</xs:appinfo></xs:annotation><xs:sequence>
<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“
dfdl:applies=“hereOnly” dfdl:length=“8” /><xs:element name="z" type="float“/>
</xs:sequence></xs:complexType>
© 2006 Open Grid Forum 68
Scoping - Example 2<xs:complexType name="example1"> <xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format applies=“toScope”
representation="binaryInteger" byteOrder="bigEndian”
lengthKind=“fixed”length=“4”/>
</xs:appinfo></xs:annotation>
<xs:group ref=“numbers”/></xs:complexType>
<xs:group name=“numbers”><xs:sequence>
<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“
dfdl:applies=“hereOnly” dfdl:length=“8” /><xs:element name="z" type="float“/>
</xs:sequence></xs:group>
Referential transparency ensures that both
examples have the same semantic
© 2006 Open Grid Forum 69
Scoping – CSV Example 1<xs:complexType name=“csv“ dfdl:applies=“toScope”
dfdl:representation=“text” dfdl:encoding=“UTF-8” dfdl:lengthKind=“delimited” dfdl:separator=“%0A%0D”
dfdl:occursSeparator=“%0A%0D” dfdl:occursKind=“delimited” ><xs:sequence> <xs:element name=“row” maxOccurs=“unbounded”> <xs:complexType>
<xs:sequence dfdl:applies=“hereOnly” dfdl:separator=“,” > <xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double“ dfdl:applies=“hereOnly”
dfdl:initiator="baseQ="/> <xs:element name="z" type="float"
dfdl:applies=“hereOnly” dfdl:initiator="irl1="/> </xs:sequence> </xs:complexType> </xs:element>
</xs:sequence></xs:complexType>
© 2006 Open Grid Forum 70
Scoping – CSV Example 2
<xs:complexType name=“csv_type” dfdl:applies=“toScope” dfdl:representation=“text” dfdl:encoding=“UTF-8”
dfdl:lengthKind=“delimited” dfdl:separator=“%0A%0D” dfdl:occursSeparator=“%0A%0D” dfdl:occursKind=“delimited” ><xs:sequence> <xs:element name=“row” maxOccurs=“unbounded”
type=“row_type”/></xs:sequence>
</xs:complexType>
<xs:complexType name=“row_type”> <xs:sequence dfdl:applies=“hereOnly” dfdl:separator=“,” > <xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double” dfdl:applies=“hereOnly” dfdl:initiator="baseQ="/> <xs:element name="z" type="float" dfdl:applies=“hereOnly” dfdl:initiator="irl1="/> </xs:sequence></xs:complexType>
Referential transparency ensures that both
examples have the same semantic
© 2006 Open Grid Forum 71
Selectors
• Selectors are a method by which multiple sets of annotation elements can be specified within a single DFDL Schema. For instance you may have a binary and text representation of the same logical data format that you wish to model within a single DFDL Schema. A selector is a special pseudo-property that can be specified on any default and construct specific annotation element. For instance:• <dfdl:format selector=”A”.../>• <dfdl:element selector=”A”.../>
• The value of the selector property is chosen by the user. If there is no selector property on an annotation element it is said to have no selector (there is no default). Selectors cannot be used with the short form of properties that are specified directly on XML Schema components.
• When the DFDL Schema is used to parse a document the user must specify the selector they wish to use. Only annotation elements that have a matching selector value will be used to parse the document.