1 data design implementation and support for build 2b november 30, 2011 steve hughes
Post on 28-Dec-2015
214 Views
Preview:
TRANSCRIPT
Topics• Overview• Key Requirements and Drivers• Build 2b Deliverables• Build 2b Deployment• Issues• Next Steps
2
Data Architecture Concepts
Tagged Data Object (Information Object)
Label Schema
Used to Create
Describes
Extracted/Specialized
InformationModel
Data Object
Data Element
Class
has
Planetary ScienceData Dictionary
Expressed As
Product
Validates
Topics• Overview• Key Requirements and Drivers• Build 2b Deliverables• Build 2b Deployment• Issues• Next Steps
5
DRIVERS FOR PDS4 Build 2a
6
RECOMMENDATION TO MC (2009) IMPLEMENTATION
• Replace PDS3 ad hoc information model with a PDS4 information model that is managed using modern tools
• The PDS4 Information Model has been designed and managed using the Protégé Ontology Modeling Tool.
• Replace ad hoc PDS3 product definitions with PDS4 products that are defined in the model
• The PDS4 Products and their components are defined using the modeling tool.
• The modeling tool provides rigorous definitions.
• The Product definition is based on the Open Archive Information System (OAIS) Reference Model, an ISO standard.
• Require data product formats to be derivations from a core set
• Support transformation from the core set.
• Four fundamental data structures have been defined.
• Additional data structures are subclasses of the four fundamental structures.
• Software written for the fundamental structures is inherited by the subclasses.
DRIVERS FOR PDS4 Build 2a
7
RECOMMENDATION TO MC (2009) IMPLEMENTATION
• Replace “homegrown” PDS data dictionary structure with an international standard.
• The PDS4 Data Dictionary structure is based on the ISO/IEC 11179 specification.
• Adopt a modern data language/grammar (XML) where possible for all tool implementations
• The PDS4 Information model is implemented in XML.
DRIVERS FOR PDS4 Build 2a
8
REQUIREMENT IMPLEMENTATION
1.3.X – Provide Data Dictionary • The PDS4 data dictionary database was developed and is compliant with the ISO/IEC 11179 specification.
• It is used to produced both data dictionary documents and data dictionary products for the registry and data dictionary service.
1.4.1 PDS will define a standard for organizing, formatting, and documenting planetary science data
• The PDS4 Information Model defines the archive organization, data formats, and product labeling standards.
• The PDS4 Standards Reference documents additional requirements.
1.4.2 PDS will maintain a dictionary of terms, values, and relationships for standardized description of planetary science data
• The PDS4 Data Dictionary defined the attributes, classes, and relationships for defining planetary science data.
1.4.3 PDS will define a standard grammar for describing planetary science data
• XML and XML Schema 1.1 have been adopted for the PDS4 implementation.
DRIVERS FOR PDS4 Build 2a
9
REQUIREMENT IMPLEMENTATION
1.4.4 PDS will establish minimum content requirements for a data set (primary and ancillary data)
• The PDS4 Information Model defines observational and ancillary product types. These products are collected into PDS4 Collections and Archive Bundles.
1.4.5 PDS will, for each mission or other major data provider, produce a list of the minimum components required for archival data
• The PDS4 Information Model defines the archive bundle and its product collections. The archive bundle and its collections are customized for each mission.
3.1.2 PDS will develop and maintain online interfaces for discipline-specific searching
• The PDS4 Information Model and Data Dictionary defines information that is needed for search.
2.3.1 PDS will develop and publish procedures for determining syntactic and semantic compliance with its standards
• The adoption of XML and XML Schema 1.1 provide syntactic and semantic standards
• They provide utilities and tools for validation.
Topics• Overview• Key Requirements and Drivers• Build 2a Deliverables• Build 2b Deployment• Issues• Next Steps
10
Build 2a Scope
• Begin supporting PDS4 label design for LADEE and MAVEN; Begin planning/testing migration
• Support the Policy on Acceptable PDS4 Data Formats
• Support transition of the central catalog to the registry infrastructure
• Deploy early PDS4 software tools and services 11
Build 2a Deliverables
12
Document/Artifact Processes
1 Introduction Data Provider
2 Concepts Document Standards Development
3 Glossary
4 Jumpstart Guide
5 Data Provider’s Handbook
6 Standards Reference
7 Data Dictionary
8 Example Products
10 Generic Schemas
11 Information Model
PDS4 Documents in Context
ConceptsDocument
Big Picture
StandardsReference
RequirementsUser Friendly
XML Schemas
Blueprints
PDS4Product Labels
Deliverables
Data Dictionary
Definitions
PDS4 InformationModel Specification
RequirementsEngineering Specification
Informative
Data Provider’sHandbook
Cookbook
deriv
egenerates
references
creates /validates
inst
ruct
generates
refe
renc
es
RegistryConfiguration File
Object Descriptions
configures
generates
Registry
Product Tracking and Cataloging
gene
rate
s
Introduction toPDS4 Documentation
Jumpstart
Data DictionaryTutorial
Complete
Some TBD
Legend
Data Format Deliverables vis-à-vis Policy
14
Policy Deliverable
PDS shall accept the following PDS4 data formats:
• Fixed-width binary and ASCII tables that are composed of identically structured records
• Table_Base - The Table Base class defines a heterogeneous repeating record of scalars.
• Table_Character and Table_Binary are defined as types of Table_Base.
• N-dimensional arrays of homogeneous binary elements (N<=16)
• Array_Base - The Array Base class defines a homogeneous N-dimensional array of scalars.
Data Format Deliverables vis-à-vis Policy
15
Policy Deliverable
• Variable-width character 'spreadsheets' that are composed of repeating, M-field, stream-delimited records where the fields themselves are (separately) delimited and may have variable widths (M>0)
• Delimited_Table - The Delimited_Table class defines a simple table (spreadsheet) with delimited fields and records.
• It is defined as a type of Parsable_Byte_Stream.
• NAIF/SPICE files • The SPICE_Kernel_Binary and SPICE_Kernel_Text classes describe SPICE files.
• PDS shall accept ASCII text and PDF/A formats for PDS4 documentation. PDS shall accept JPEG, GIF, and TIFF images for figures accompanying documents. PDS shall accept any of the approved structures and formats for browse products.
• Product_Document - A Product Document is a product consisting of a single logical document comprised of one or more document formats.
• ASCII Text and PDF/A are currently allowed as document formats.
• JPEG, GIF, TIFF, and PNG are allowed as non-science image formats.
PDS4 Observational Product
Identification_Area
Cross_Reference_Area
Observation_Area
File_Area
Digital_Object
Subject_Area
Bibliographic_Reference
Mission_AreaNode_Area
Observing_System
Reference_Entry
[0..1]
[1]
[1]
[1..*]
[0.*]
[0..*]
[0..*]
[1..*]
[0..*]
[0..*]
[1]
Data_Standards [1]
Data Standards Development Process
Domain Knowledge
PDS4 Information
Model
Information Modeling
Tool
• Domain expertise was captured in the PDS4 Information Model as an ontology.
• The model represents a consensus of the domain experts.
• The model is the single source for the PDS4 Data Standards, for example the generated XML Schemas.
Filter and Translator
XML Schema
(Generic)XML Schema
(Generic)XML Schema
(Generic)XML Schema
(Generic)
Topics• Overview• Key Requirements and Drivers• Build 2b Deliverables• Build 2b Deployment• Issues• Next Steps
22
Build 2b Deployment• Resolve build 2a liens (to be discussed) and
generate a build 2b deployment
• Generate a release of the information model, companion documents and supporting tutorial material
• Generate new schemas
• Generate registry configuration information
• Post key documents to PDS website
23
Topics• Overview• Key Requirements and Drivers• Build 2b Deliverables• Build 2b Deployment• Issues• Next Steps
24
Build 2a Identified Liens
27
Lien Brief ExplanationNeed to finalize and freeze the information model for Build 2b incorporating high priority changes identified in Build 2a.
Address issues found with the information model focusing primarily on the core components of the product labels and the aggregate products, collections and bundles.
Need capabilities to support local data dictionary validation and the creation of schema and human-readable definition lists.
There is a lack of instructions for creating, validating, and using local keywords and classes (this includes lack of support for generating human-readable definition lists for peer review).
Build 2a Identified Liens
28
Lien Brief Explanation
Need to baseline the current documentation; Need to provide additional information/ changes.
Documents are still overlapping, not up to date, inconsistent in areas, and have gaps.
Need to finalize and freeze the XML Schema for Build 2b incorporating the extension schemas currently under testing by the DDWG
Newer “extension” style schemas are not yet mature enough to be used by an external data provider. They seem to be preferred over the older but stable “flat” schemas that were available for the node exercises. Both are currently produced and produce similar labels.
Topics• Overview• Key Requirements and Drivers• Build 2b Deliverables• Build 2b Deployment• Issues• Next Steps
29
Build 2b Actions – Jan ‘12• Finalize and freeze the information model for Build 2b
incorporating high priority changes identified in Build 2a.
• Use existing capabilities to support local data dictionary validation and the creation of schema and human-readable definition lists.
• Baseline the current documentation•Add any additional information/ changes to an online resource (e.g., wiki)
• Finalize and freeze the XML Schema for Build 2b incorporating the extension schemas currently under testing by the DDWG
.30
Conclusion• The PDS4 Information Model represents the DDWG
consensus.• A large number of decisions resulting from much
discussion were captured in the model.• All had a say, not everyone always got their way.
• On the scheduled date the model will be frozen and the PDS4 Data Standards will be generated and deployed.• The schemas, the dictionary, and all other
generated artifacts will be consistent with the model.• The current consensus, as reflected in the model will be
operational.
31
Acknowledgements*
Ed BellRichard ChenDan CrichtonAmy CulverPatty GarciaEd GrayzeckEd GuinnessMitch GordonSean HardmanLyle HuberSteve HughesChris IsbellSteve Joy
* Anyone who sat through a DDWG 2-hour telecon or provided useful input.
Ronald JoynerDebra KazdenTodd KingJoe MafiMike MartinThomas MorganLynn NeakrasePaul RamirezAnne RaughMark RoseElizabeth RyeBoris SemenovDick SimpsonSusie Slavney
Peter AllanDavid HeatherMichel GangloffSanta MartinezThomas RoatschAlain Sarkissian
Too Many {objects, classes, schemas, …}
• Abstract (vacuous) classes are used for organizational purposes.• These are not included in the schemas and many
are being deleted.
• Subclasses of the four fundamental structures are used to partition the set of allowed structures, for example the Array_2D_Image subclass of Array_Base.• Question to be answered, does the PDS want to
provide software specific to Array_2D_Image?• All Array_Base software works for any
Array_2D_Image.
35
Too Many {objects, classes, schemas, …}
• Subclasses of a product component are used to provide specificity, for example, the subclass Bundle_Member_Entry.• There are three methods, change the name,
change the namespace (new file), or use optional attributes.
• Some specific subclasses are used for special purposes, for example Table_Field_Checksum in an Inventory.• Consider using Schematron Assert statements to
validate.
.36
Too Many {objects, classes, schemas, …}
• Some classes result from the process of normalization, for example array_axis and array_element.• Emperor Joseph II: …And there are simply too many
notes, that's all. Just cut a few and it will be perfect. Mozart: Which few did you have in mind, Majesty?
.37
By the numbers• Fundamental Data Structures – 4• Lines of Schema Code• Flat 18K• Master 4k-6k
• Classes dropped (Master) – nn• SimpleTypes dropped (Master) – 200• Actionable items closed – 1.5K• Actionable items open - < 50• Issues from reviews – 1k+
.39
Totals
Internal IPDA External Readiness Total
Narrative 11 4 18 15 48
Documentation 143 152 250 87 632
Actionable 1 15 16 31 63
Discussion 13 76 42 43 174
Research 8 5 33 44 90
Kudo 34 24 29 1 88
System/Tools 4 6 3 22 35
Discipline 1 4 14 10 29
Process 0 13 0 1 14
Total 215 299 405 254 1173
Post Build 2b – Summer ‘12
• Develop discipline level classes for the next phase of data set migration
• Refine the document suite and its organization.• Support development of tools scheduled for the
next build.• Support development of data dictionary and
local data dictionary services.
41
top related