© 2006 open grid forum data format description language (dfdl) progress update mike beckerle, oco,...

71
© 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc.

Upload: emily-dobson

Post on 27-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Data Format Description Language (DFDL)

Progress Update

Mike Beckerle, Oco, Inc.

Page 2: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 2

Admin

• DFDL WG co-chairs:• Mike Beckerle, Oco• Martin Westhead, Avaya

• Two note takers please

• Sign the attendance sheet

• Note: OGF Intellectual Property Rules apply

Page 3: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 3

OGF IPR Policies Apply• “I acknowledge that participation in this meeting is subject to the OGF Intellectual Property

Policy.”• Intellectual Property Notices Note Well: All statements related to the activities of the OGF

and addressed to the OGF are subject to all provisions of Appendix B of GFD-C.1, which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to:

• the OGF plenary session, • any OGF working group or portion thereof, • the OGF Board of Directors, the GFSG, or any member thereof on behalf of the OGF, • the ADCOM, or any member thereof on behalf of the ADCOM, • any OGF mailing list, including any group list, or any other list functioning under OGF auspices, • the OGF Editor or the document authoring and review process

• Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions.

• Excerpt from Appendix B of GFD-C.1: ”Where the OGF knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.”

• OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.

Page 4: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 4

Agenda

• What is DFDL?• Provide enough context for those

interested in getting involved but who haven't been following along

• Progress review• Latest on the draft specification

• Next steps• Participation needed to complete the

specification, initial implementations

Page 5: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 5

Agenda

• What is DFDL?• Provide enough context for those

interested in getting involved but who haven't been following along

• Progress review• Latest on the draft specification, current

implementations of DFDL processors

• Next steps• Participation needed to complete the

specification, initial implementations

Page 6: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 6

Data Interchange Formats

There are two kinds:

• Prescriptive standards: “Put your data in this format!”• Textual - XML• Binary – ASN.1, XDR, EBML, …• You use the defined encodings, syntax, …• Descriptions are shareable

• Others: “Choose your own format”• Textual, binary, mixture • You use your own encodings, syntax, …• No common description DFDL: a universal description for any format

Page 7: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 7

What is DFDL?

1. A way of describing data…• It is NOT a data format itself!

2. That can describe any data format…• Textual and binary • Commercial record-oriented • Scientific and numeric• Modern and legacy

3. While allowing high performance…• Choose the right data format for the job

• High density, optimized I/O, random access

• No need to use DFDL libraries

Page 8: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 8

Benefits of DFDL

1. Data format independent2. Prescriptive standards not always best for job3. Descriptions can be shared

4. Interoperability 5. Allows after-the-event description6. Supports move to SOA 7. Spans commercial and scientific worlds8. Supports grid computing

Page 9: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 9

Why Grids and DFDL?

• Grids are about big-data and big-computation problems• Simplistic solutions like “use XML” won’t cut it!• Performance and space usage

• Grids are about universal data interchange• XML – use XML Schema• Relational – use RDBMS schema• Other – use DFDL

Page 10: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 10

DFDL Goals

• Build on existing standards• Leverage XML technology and concepts

• Support very efficient parsers/formatters• Support round-tripping

• Read and write data in described format from same description

• Keep simple cases simple• Simple descriptions should be "human readable"

• To the same degree as XML Schema • Generality

• Can describe any format • Allow extensions for new data formats

Page 11: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 11

XML Synergy

• Use XML Schema subset & type system to describe the logical format of the data

• Use annotations within the XSD to describe the physical representation of the data

• Use XPath when referencing fields within the data

• This approach actively used today:• IBM WebSphere Message Broker• Microsoft BizTalk flat file• Others

Page 12: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 12

DFDL FeaturesRelease 1:• Subset of XML Schema• Rich textual & binary data capabilities• Scoping rules that govern how the annotations apply• Validated input and output - from XML Schema• Defaults - for missing values• ‘Null’ capability - for out-of-band values• Reference – use of a previously read value in subsequent expressions• Basic math – in expressions• Uncertainty – stratagems to resolve choices, optionality• Arrays – one dimension• Very general parsing/writing capability

Future:• Multi-layer – description of an intermediate rep not exposed in final

result• Extensibility – new type/transform specification• Arrays - multi-dimension

Page 13: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 13

DFDL Schema Model

• Simple Types• Represent data values

• Complex Types• Represent structures

• Elements• Represent named fields• Can be data fields or structural fields• Can repeat (arrays)

• Sequence groups• Structures where the children occur in order

• Choice groups• Structures where just one of the children occurs (eg: C union, COBOL redefine)

• Wildcards• Points where any element can occur

Missing – multi-dimensional arrays. Deferred to a future version.

Page 14: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 14

DFDL Schema Model

type

dimension

element

simpleType

simple type hierarchy on later slide

sequence choice

group

*

wildcard

*

*complexType

Page 15: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 15

XML Schema Subset

• Namespace management• xs:import/xs:include file management• Local and global xs:element declarations

• Optional dimensionality via maxOccurs and minOccurs• Optional default value• Optional nillable attribute (null value support)

• Local and global xs:complexType definitions• Most built-in xs:simpleTypes – see later slide• Local and global user-defined xs:simpleTypes by derivation• Local and global xs:sequence groups (no dimensionality)• Local and global xs:choice groups (no dimensionality)• xs:any element wildcards• xs:appinfo annotations to carry the DFDL information

Page 16: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 16

XML Schema – Not Required

• Local and global xs:attribute declarations• xs:attribute groups• xs:complexType derivations• xs:union and xs:list simple types• Built-in xs:simpleTypes specific to XML or otherwise superfluous• Dimensionality on non-elements (that is, model groups)• Identity Constraints • Substitution Groups • xs:redefine file management• Local and global xs:all groups

Page 17: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 17

Supported Simple Types

anySimpleType

string QName NOTATION float double decimal boolean base64Binary hexBinary anyURI

normalizedString

token

language Name NMTOKEN

NMTOKENSNCName

ID IDREF ENTITY

IDREFS ENTITIES

integer

long nonPositiveInteger nonNegativeInteger

negativeInteger positiveInteger unsignedLong

unsignedInt

unsignedShort

unsignedByte

int

short

byte

date time dateTime gYear gYearMonth gMonth gMonthDay gDay duration

Page 18: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 18

What DFDL is Not: FAQ

• I have a pre-defined XML Schema.• Q: Can I use DFDL to populate it from a non-

XML data file?

• A: Only partly:• DFDL is focused on data format• DFDL does not provide general data transformation• Populating a pre-defined XML Schema involves two

separate problems:1. Using DFDL to describe the data file format2. Using a transformation system to transform that to conform

to the pre-defined schema (not DFDL's job)

Page 19: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 19

DFDL is only about Format !

• The structure of the DFDL schema is dictated by the logical structure of the data

• You must work bottom up. • Start from the data format, not from

what you want to turn it into.

Page 20: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 20

Example 1: XML

<w>5</w>

<x>7839372</x>

<y>8.6E-200</y>

<z>-7.1E8</z>

Page 21: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 21

Example 1: XSD

<xs:complexType name=“example1”>

<xs:sequence>

<xs:element name="w" type=“int"/>

<xs:element name="x" type=“int"/>

<xs:element name="y" type=“double"/>

<xs:element name="z" type=“float"/>

</xs:sequence>

</xs:complexType>

Page 22: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 22

Example 1: Binary

0000 0005 0077 9e8c

169a 54dd 0a1b 4a3f

ce29 46f6

Page 23: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 23

Example 1: Binary DFDL<xs:complexType name="example1"> <xs:annotation>

<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format represtation="binaryInteger"

byteOrder="bigEndian” lengthKind=“fixed” length=“4”/>

</xs:appinfo></xs:annotation><xs:sequence>

<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“

dfdl:length=“8” /><xs:element name="z" type="float“/>

</xs:sequence></xs:complexType>

Long form

Short form

Page 24: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 24

Example 1: Textual

5,7839372,baseQ=8.6E-200,irl1=-7.1E8

Separators, initiators, terminators are all

examples of markup

Page 25: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 25

Example 1: Textual DFDL <xs:complexType name="example1"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format repType=“text” encoding=“UTF-8” numberScheme=“myDecimals”

lengthKind=“delimited” /> </xs:appinfo>

</xs:annotation><xs:sequence dfdl:separator="," >

<xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double" dfdl:initiator="baseQ="/>

<xs:element name="z" type="float" dfdl:initiator="irl1="/>

</xs:sequence></xs:complexType>

Page 26: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 26

Long Form DFDL Annotations

<xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format separator=","/> </xs:appinfo> </xs:annotation> … …

<xs:element name="y" type="double"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format initiator="baseQ="/> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>

Page 27: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 27

Short Form DFDL Annotations

<xs:sequence dfdl:separator=","> …

<xs:element name="y" type="double" dfdl:initiator="baseQ="/> … </xs:sequence>

Non-native attribute syntax,

easy for users to write

Page 28: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 28

Element Form DFDL Annotations

Used for properties that themselves contain the quotation mark characters and escape characters

<xs:sequence>

<xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0">

<dfdl:property name='separator'>,</dfdl:property> </xs:appinfo> </xs:annotation> … …

<xs:element name="y" type="double"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0">

<dfdl:property name='initiator'>baseQ=</dfdl:property> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>

Page 29: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 29

Long Form – Construct Specific

<xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=","/> </xs:appinfo> </xs:annotation>

… <xs:element name="y" type="double">

<xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element initiator="baseQ="/> </xs:appinfo> </xs:annotation> </xs:element> … </xs:sequence>

Annotation on xs:element can be dfdl:element.

Restricts properties to those sensible for

elements.

Annotation on xs:sequence can be dfdl:sequence.

Restricts properties to those sensible for sequences

Page 30: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 30

Annotation Elements

format Defines the data format properties that apply to part of the logical data models when using long form. (Also the construct specfic variations).

defineFormat Defines a reusable data format by collecting together other annotations and associating them with a name that can be referenced from elsewhere.

assert Defines truths about a DFDL model when parsing the data.

discriminator Defines an expressions to use for predicate testing to resolve uncertainties such as choice branches

numberFormat Defines the scheme used for expressing numbers in the logical data model. This includes aspects such as radix, field separators, digit grouping separators, leading or trailing signs, etc.

escapeScheme Defines the scheme by which quotation marks and escape characters can be specified. This is for use with delimited text formats.

property Used in the syntax of dfdl:format annotations when the property value is incompatible with XML attribute content.

defineVariable Defines a variable that can be referenced elsewhere. This can be used to communicate a parameter from an earlier part of a schema to a later part.

setVariable Sets the value of a variable that has been defined earlier in the schema.

hidden Defines a hidden element that appears in the schema for use by the DFDL processor, but is not part of the logical data model described by the schema.

typeSubstitution Defines overload of a non-annotated types with annotated meanings

Page 31: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 31

Agenda

• What is DFDL?• Provide enough context for those

interested in getting involved but who haven't been following along

• Progress review• Latest on the draft specification, current

implementations of DFDL processors

• Next steps• Participation needed to complete the

specification, further implementations

Page 32: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 32

Current Specification • Core is currently at version 031

• Supplements for• Advanced decimals• Datetimes• Advanced delimited formats (may go away)• Bi-directional text

• May be found at• https://forge.gridforum.org/sf/go/doc15032?nav=1

Page 33: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Core Specification 031

DFDL Information Set (Infoset)

DFDL Schema Model

Syntax Basics

Syntax and Usage of DFDL Annotation Elements

Expression language

Scoping Rules

DFDL Properties Introduction: The DFDL Parser and Unparser

DFDL Data Syntax Grammar

Framing

Simple Types

Properties for Nullable Elements

Sequence Groups

Unordered Sequence Groups

Assertion and Discriminator Evaluation

Choices

Arrays and Optional Elements: Repeating and Variable-Occurrence Data Items

Calculated Value Properties

Any Element Wildcard

33

Page 34: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Highlights

• Infoset• Expression Language• Calculated Values

34

Page 35: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

DFDL Infoset

• Abstract information model• Simplifies explanation of

• Parsing of an invalid instance• Unparsing of data

• That data might be constructed programatically, not necessarily from parsing data

• Clarifies data model relationship to XML data

• DFDL Schema is implied• Unlike XML Infoset, there’s no DFDL Infoset

concept without a DFDL Schema

35

Page 36: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

DFDL Infoset

• Simpler than XML Infoset• Information Items

• Document• Element

36

Page 37: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

DFDL Infoset

37

Page 38: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 38

DFDL Expression Language

• Small Subset of XPath 2.0.• Constrained usage• XPATH 2.0 large language• Data model is DFDL Infoset

• Syntax• { expression }

Page 39: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 39

Expression Language Usage

• Set properties dynamically from instance• initiator, terminator, length, separator,

occursSeparator, null

• dfdl:assert, dfdl:discriminator • inputValueCalc, outputValueCalc • dfdl:setVariable

Page 40: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 40

XPath 2.0 Subset

• XPath 2.0 types• Limited path expressions

• Must return single value• Limited axis

• XPath 1.0 functions• Variables

Page 41: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 41

Path Expressions

• / axis :: nodetest [ predicate ] /• Axis

• child, parent (..) , self (.)

• Nodetest• Nametest - Name or wildcard

• [ predicate ] • Must return array index

Page 42: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Calculated Values

• DFDL supports limited layering • Properties:

• inputValueCalc• outputValueCalc• outputLengthCalc• Note: inputLengthCalc is just length=“{ … }”

42

Page 43: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Calculated Values Example

• Logically, the data is a date.<element name=“d” type=“date”/>

• Physically, it is stored as 6 packed-decimal digits, unsigned• Like so:

43

Y YD DM M

Page 44: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Physical Layer’s Shape

<sequence>

<element name="mm" type="int“ />

<element name="dd" type="int“ />

<element name="yy" type="int“ />

</sequence>

44

Y YD DM M

Page 45: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Physical Layer Representation

<sequence dfdl:representation="binary"

dfdl:numberRepresentation="packedDecimal"

dfdl:decimalSigned="false“ >

<element name="mm" type="int"

dfdl:numberPattern="##“ />

<element name="dd" type="int"

dfdl:numberPattern="##“ />

<element name="yy" type="int"

dfdl:numberPattern="##“ />

</sequence>

45

Y YD DM M

Page 46: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Hide Physical Layer<sequence>

<annotation><appinfo ...>

<dfdl:hidden>

<element name="pdate"> <complexType>

<sequence dfdl:representation="binary"

dfdl:numberRepresentation="packedDecimal"

dfdl:decimalSigned="false“ >

<element name="mm" type="int"

dfdl:numberPattern="##“ />

<element name="dd" type="int"

dfdl:numberPattern="##“ />

<element name="yy" type="int"

dfdl:numberPattern="##“ />

</sequence>

</complexType>

</element>

</dfdl:hidden>

</appinfo></annotation>

<element name="d" type="date">

</element>

46

Y YD DM M

Logical layer

Physical layer(hidden)

Page 47: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Parsing: Calculate Logical from Physical

<sequence>

… hidden pdate here …

<element name="d" type="date">

<annotation><appinfo>

<dfdl:format>

<dfdl:property name="inputValueCalc">

{

fn:date(fn:concat(if ( ../pdate/yy > 50 ) then "19" else "20",

fn:string(../pdate/yy),

"-",

fn:string(../pdate/mm),

"-",

fn:string(../pdate/dd)))

}

</dfdl:property>

</dfdl:format>

</appinfo></annotation>

</element>

47

Y YD DM M

dfdl:property annotations to avoid quoting hell

Page 48: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Unparsing: Calculate Physical from Logical

<sequence dfdl:representation="binary"

dfdl:numberRepresentation="packedDecimal"

dfdl:decimalSigned="false"

<element name="mm" type="int"

dfdl:numberPattern="##"

dfdl:outputValueCalc="{ fn:month-from-date(../d) }" />

<element name="dd" type="int"

dfdl:numberPattern="##"

dfdl:outputValueCalc="{ fn:day-from-date(../d) }" />

<element name="yy" type="int"

dfdl:numberPattern="##"

dfdl:outputValueCalc="{ fn:year-from-date(../d) mod 100 }" />

</sequence>

48

Y YD DM M

Page 49: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 49

Agenda

• What is DFDL?• Provide enough context for those

interested in getting involved but who haven't been following along

• Progress review• Latest on the draft specification, current

implementations of DFDL processors

• Next steps• Participation needed

• Readers & editors

Page 50: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Where are we on this spec?

schema components (UML – 2 layer)

regular expressions for lengths (Alan)

valueCalc examples (Mike)

property precedence (Steve)  

bring supplements up-to-date (Steve)

assertions/discriminators and choice (Suman)

how speculative parsing works (combining choice and variable-occurrence - currently these are separate) (IBM WTX person)

reordering the properties discussion (move representation earlier, improve flow of topics) (Alan)

50

We're Done

Start Over

escapeSchemes (Ian P)

string xml type (Ian P)

variables (Mike)

selectors (Suman)

improvements on property descriptions (All - split tbd)

envelopes and payloads (Steve)

Official OGF “Draft”

Page 51: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Where are we on this spec?

v032 

v034

51

We're Done

Start Over

v033

Official OGF “Draft”

• 30 to 60 days between versions

• Official “draft” by roughly August

Page 52: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 52

Completion of Specification

• Anybody willing to join the DFDL WG in an active capacity ?

• Anybody interested in reviewing the spec ?• To validate that R1 satisfies sufficient

requirements

Page 53: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 53

New Implementations

• The more the merrier !

• Anybody interested in creating or collaborating on a reference implementation?

Page 54: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 54

Requirements Discussion

• Q: How many people were planning to use DFDL to convert data into XML ?

• Q: How many people were not ?

Page 55: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 55

END

Page 56: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 56

Full Copyright Notice

Copyright (C) Open Grid Forum (2004, 2007). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works.

The limited permissions granted above are perpetual and will not be revoked by the OGF or its successors or assignees.

Page 57: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum

Data Format Description Language (DFDL)

Extra Information

Page 58: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 58

Standards Efforts in This Area• ESML (Earth Science Markup Language) Users can write external files

using ESML schema to describe the structure of various earth science data formats. Applications can utilize the ESML Library to parse this description file and decode the data format.

• XSIL (Extensible Scientific Interchange Language ) The Extensible Scientific Interchange Language (XSIL) is a flexible, hierarchical, extensible, transport language for scientific data objects.

• BFD (Binary Format Description) language The Binary Format Description (BFD) language is an XML dialect based on the eXtensible Scientific Interchange Language(XSIL) that supports the executable documentation of 'arbitrary' binary and ascii data sets.

• BinX (Binary XML Description Language) Another language used to describe the content, structure and physical layout (endian-ness, blocksize) of binary files. BinX is designed to enable transparent transfer of data between diverse platforms.

Page 59: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 59

DFDL Data Syntax Grammar

• Data in a format describable via a DFDL schema obeys the syntax grammar

• Data divided into • Content – used to compute logical value• Framing – padding, delimiters, separators,

etc

Page 60: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 60

Elements

• Document = Element

• Element = Prefix ElementContent Suffix

• ElementContent = SimpleContent | ComplexContent

• ComplexContent = SequenceGroup | ChoiceGroup

• SimpleContent = NumberContent | StringContent |CalendarContent | BinaryContent | BooleanContent

Page 61: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 61

SequenceGroup

• SequenceGroup = Prefix SequenceGroupContent Suffix

• SequenceGroupContent = [ SequenceGroupPrefix [ GroupItem [ SequenceGroupSeparator GrouptItem]*] SequenceGroupSuffix ]

• GroupItem = Element | Array | ComplexContent

Page 62: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 62

Choice Group

• ChoiceGroup = Prefix ChoiceGroupContent Suffix

• ChoiceGroupContent = ChoiceGroupResolvableItem | ChoiceGroupUnresolvableItem

• ChoiceGroupResolvableItem = GroupItem

• ChoiceGroupUnresolvableItem = Opaque

Page 63: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 63

Array

• Array = ArrayContent

• ArrayContent = [ OccursPrefix [ Element [ OccursSeparator Element ]* ] [ OccursSeparator StopValue ] OccursSuffix ]

• StopValue = SimpleContent

Page 64: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 64

Nulls, Optional, Defaults

• Unparsing does not use defaults to make invalid logical tree valid. (“system” may provide validation)

• Required parse elements• Scalar (minOccurs=maxOccurs=1 or neither is specified )• Array of fixed occurances• Everything else is optional (minOccurs, maxOccurs only used for validation)

• If a required element is not found by speculative parsing then use default value.

• If an optional element is not found by speculative parsing then we’re at the end of the variable occurrences. No defaulting occurs

• NULLS• Decoupled from defaults• useNullValueForDefault • nullValueInitiatorPolicy

Page 65: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 65

Scoping

<xs:sequence dfdl:separator="," dfdl:encoding="ebcdic-cp-us"> <xs:element name="w" type=“string”/> <xs:element name="x" type=“myType”/></xs:sequence>

<xs:complexType name=“myType"> <xs:sequence> … </xs:sequence></xs:complexType>

• Are the elements of the sequence in EBCDIC, or just the separators? (“lexical scoping”)

• Are the contents of type ‘myType' affected by the sequence’s properties or not? (“referential transparency”)

Page 66: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 66

Scoping - Rules

• Scope of a dfdl:format annotation is controlled by a special “applies” attribute• “hereOnly” - applies to the annotated construct only

• “toScope” - applies to the annotated construct AND to any contained or referenced constructs

• A construct can have two dfdl:format annotations, one with “hereOnly” and one with “toScope”

• Most local “toScope” annotation wins

• “Referential transparency” maintained

• One exception – global elements and groups • Can override at point of reference

Page 67: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 67

Scoping - Example 1<xs:complexType name="example1"> <xs:annotation>

<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format applies=“toScope”

representation="binaryInteger" byteOrder="bigEndian”

lengthKind=“fixed” length=“4”/>

</xs:appinfo></xs:annotation><xs:sequence>

<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“

dfdl:applies=“hereOnly” dfdl:length=“8” /><xs:element name="z" type="float“/>

</xs:sequence></xs:complexType>

Page 68: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 68

Scoping - Example 2<xs:complexType name="example1"> <xs:annotation>

<xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:format applies=“toScope”

representation="binaryInteger" byteOrder="bigEndian”

lengthKind=“fixed”length=“4”/>

</xs:appinfo></xs:annotation>

<xs:group ref=“numbers”/></xs:complexType>

<xs:group name=“numbers”><xs:sequence>

<xs:element name="w" type="int"/><xs:element name="x" type="int“/><xs:element name="y" type="double“

dfdl:applies=“hereOnly” dfdl:length=“8” /><xs:element name="z" type="float“/>

</xs:sequence></xs:group>

Referential transparency ensures that both

examples have the same semantic

Page 69: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 69

Scoping – CSV Example 1<xs:complexType name=“csv“ dfdl:applies=“toScope”

dfdl:representation=“text” dfdl:encoding=“UTF-8” dfdl:lengthKind=“delimited” dfdl:separator=“%0A%0D”

dfdl:occursSeparator=“%0A%0D” dfdl:occursKind=“delimited” ><xs:sequence> <xs:element name=“row” maxOccurs=“unbounded”> <xs:complexType>

<xs:sequence dfdl:applies=“hereOnly” dfdl:separator=“,” > <xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double“ dfdl:applies=“hereOnly”

dfdl:initiator="baseQ="/> <xs:element name="z" type="float"

dfdl:applies=“hereOnly” dfdl:initiator="irl1="/> </xs:sequence> </xs:complexType> </xs:element>

</xs:sequence></xs:complexType>

Page 70: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 70

Scoping – CSV Example 2

<xs:complexType name=“csv_type” dfdl:applies=“toScope” dfdl:representation=“text” dfdl:encoding=“UTF-8”

dfdl:lengthKind=“delimited” dfdl:separator=“%0A%0D” dfdl:occursSeparator=“%0A%0D” dfdl:occursKind=“delimited” ><xs:sequence> <xs:element name=“row” maxOccurs=“unbounded”

type=“row_type”/></xs:sequence>

</xs:complexType>

<xs:complexType name=“row_type”> <xs:sequence dfdl:applies=“hereOnly” dfdl:separator=“,” > <xs:element name="w" type="int"/> <xs:element name="x" type="int"/> <xs:element name="y" type="double” dfdl:applies=“hereOnly” dfdl:initiator="baseQ="/> <xs:element name="z" type="float" dfdl:applies=“hereOnly” dfdl:initiator="irl1="/> </xs:sequence></xs:complexType>

Referential transparency ensures that both

examples have the same semantic

Page 71: © 2006 Open Grid Forum Data Format Description Language (DFDL) Progress Update Mike Beckerle, Oco, Inc

© 2006 Open Grid Forum 71

Selectors

• Selectors are a method by which multiple sets of annotation elements can be specified within a single DFDL Schema. For instance you may have a binary and text representation of the same logical data format that you wish to model within a single DFDL Schema. A selector is a special pseudo-property that can be specified on any default and construct specific annotation element. For instance:• <dfdl:format selector=”A”.../>• <dfdl:element selector=”A”.../>

• The value of the selector property is chosen by the user. If there is no selector property on an annotation element it is said to have no selector (there is no default). Selectors cannot be used with the short form of properties that are specified directly on XML Schema components.

• When the DFDL Schema is used to parse a document the user must specify the selector they wish to use. Only annotation elements that have a matching selector value will be used to parse the document.