data format description language (dfdl) wg martin westhead epcc, university of edinburgh...

17
Data Format Data Format Description Language Description Language (DFDL) WG (DFDL) WG Martin Westhead EPCC, University of Edinburgh [email protected] Alan Chappell PNNL [email protected]

Upload: jaden-warren

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Data Format Description Data Format Description Language (DFDL) WGLanguage (DFDL) WG

Martin WestheadEPCC, University of [email protected]

Alan ChappellPNNL

[email protected]

Page 2: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

AgendaAgenda

• Introduction and welcome - Martin Westhead 10mins• Binary Format Description Language (BFD) - Alan Chappell 10mins • Binary XML (BinX) - Stephen Rutherford 10mins • DFDL - Martin Westhead 15mins

– Big picture– Structural Description Language– Charter

(20 mins Discussion)

• Examples repository - Alan Chappell 10mins – Bruce Barkstrom Examples at NASA

(15mins Discussion)

Page 3: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

MotivationMotivation

• There will never be a standard data format– E.g. XML – verbose, tree-based, explicit structure– Legacy formats– Application specific formats– One size will never fit all

• But could we provide a language for describing formats– Transparency of physical representation– Automatic format conversion– Unambiguous description of data

Page 4: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

There’s more…There’s more…

Explicit structure enables:• Standard transformation to/from XML

representation– Could allow application to read/write XML – But provide underlying efficient binary representation

• Data stream/file becomes database– Point to parts of the structure– Extract parts of the structure– Modify parts of the structure– Integrate parts of different structures

Page 5: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

And more…And more…

• Generic tools possible– Browsing– Conversion and transformation

• Annotation of data– E.g. identify bits that depict hurricane in an image

• Enables general semantic labels, many ontologies could be developed e.g.:– S.I. units, SQL types, Time– Community specific labels, “starClass = whiteDwarf”– Application specific labels, “nodeColour = green”

• Could lead to a standard transformation language

Page 6: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Not fairy talesNot fairy tales

• Based on implemented work– BinX

http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/– BFD part of the Scientific Annotation Middleware

project (http://www.scidac.org/SAM/)

• Generalized and extended a little

• Formal semantics

• Foundation for extensibility

Page 7: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

ApproachApproach

• Separate out structure and semantics• General structural language

– Repetition– Pointers– References to data– New structures can be built (compositionality)

• Semantics– Hard to express so…we don’t– General labeling– Label semantics define elsewhere (ontologies)– Labels can be added (extensibility)

Page 8: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Structure – arbitrary labelsStructure – arbitrary labels

fooSet

fooPairfoo

bunchThings

thing 0

thing 1

thing 1

thing 0

thing 0

thing 1

thing 1

thing 1

bunchThings .

.

.

.

.

.bunchThings

bunchThings

foo .

.

.fooPair .

.

.fooPair

fooPair

Page 9: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Structure – example labelsStructure – example labels

complexArray

complexfloat

byte

bit 0

bit 1

bit 1

bit 0

bit 0

bit 1

bit 1

bit 1

byte .

.

.

.

.

.byte

byte

float .

.

.complex .

.

.complex

complex

Page 10: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Structural languageStructural language• Formal semantics

– Structured binary sequence– Defines hierarchical structure over underlying sequence of binary values

• Language for describing hierarchical structure– Repetition

• Explicit number repeats• Termination characters

– Data reference• Conditionals• Data size

– Pointers• Scope

– As general as possible but– Must be concise and implementable

• Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)

Page 11: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

CSV file exampleCSV file example

char:=byte

data:=[(char - [',']).*]

field:=[data; [',']]

finalField:=[data; [‘\n’]]

row:=[field.*] :: [finalField]

table:=[row.*]

Page 12: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Semantic labelsSemantic labels

• Many ontologies possible• Initial scope probably:

– Basic types (floating point, integer, character)– Simple structures (structs, arrays, tables)

• Obvious extensions:– SQL types– XML Schema types

• Key WG goal:– Define form and requirements of new ontologies

Page 13: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

What is an Ontology?What is an Ontology?

• XML Schema for new types

• Structural description of new types

• Definition of core API behaviour on new type

• API extensions

• Relationships to other types

Page 14: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

WG goalsWG goals

• Formal language for DFDL data structure

• Standard representation of this language in XML

• Requirements for DFDL ontology

• Basic types ontology

• Basic structures ontology

Page 15: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Currently under discussionCurrently under discussion

• Abstraction from the underlying binary– Compression, encoding, encryption– Physical vs. conceptual binary sequence

• Abstraction of description– complex:=[foo; foo]– Instantiate “foo:= float” or “foo:= double” at use time

• Filtering of results– Getting to data model and leave format behind– CSV -> [[value; value; value]; [value; value; value]]

Page 16: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

DFDL in the VODFDL in the VO

• Generic tools

• Metadata possibilities– Ontologies can define relationships between

types– E.g. polar to Cartesian– Standard classes over data objects

Page 17: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org

Getting involvedGetting involved

• Webpages:

http://www.epcc.ed.ac.uk/dfdl

• Mailing list ([email protected])

• My address:[email protected]