1
- A View from the Field -
The Next Generation Data Standards For the PDS
- PDS4 -
ESIP Federation Meeting
July 8, 2009
J. Steven HughesJPL
Copyright 2009 California Institute of Technology
Government sponsorship acknowledged
PDS Mission and Vision Statement• Mission: The mission of the Planetary Data System is to
facilitate achievement of NASA’s planetary science goals by efficiently collecting, archiving, and making accessible digital data produced by or relevant to NASA’s planetary missions, research programs, and data analysis programs
• Vision:• To gather and preserve the data obtained from
exploration of the Solar System by the U.S. and other nations
• To facilitate new and exciting discoveries by providing access to and ensuring usability of those data to the worldwide community
• To inspire the public through availability and distribution of the body of knowledge reflected in the PDS data collection
• PDS is a federation of heterogeneous nodes including science and support nodes
About the Planetary Data System• NASA’s official archive for Planetary Science Data
• Missions are required to archive data with PDS• Data is peer reviewed as part of the archiving of data• PDS works to support planetary science R&A
• Federation of nodes• Science discipline nodes provide scientific and data
management expertise• Central engineering (JPL) addresses PDS-wide software and
standards• Management node (GSFC) provides management support to
PDS• Governed by the PDS Management Council which is formed
from the node managers
• Standard archiving processes and tools• PDS supports diversity of data types but promotes a
homogeneous architecture for archives
PDS Engineering Challenges
• All data online and distributed across the PDS enterprise• Single point of access to all data
through an integrated infrastructure
• Substantial increases in data volumes
• International coordination and interoperability• Shared standards• Sharing of science products• Satisfying US data sharing
regulations
• Supporting diverse user needs
• Replacing aging technology, tools and processes
Archive Volume Growth
0
10
20
30
40
50
60
70
80
90
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
Year
TB
(A
ccu
m)
TBytes
Why is PDS4 Necessary?
• The PDS data standards were developed in the late 1980’s to define the concepts and terms needed for archiving science data in the planetary science domain.
• Even though the data standards were innovative for their time, ambiguity and many assumptions have crept in over almost two decades of use.
• This situation has caused significant problems for PDS operations, data providers, and end-users.
• In 2006 the International Planetary Data Alliance (IPDA) was formed.
• Charter stated that the PDS data standards, as the de-facto planetary standards, should reviewed and the core elements be adopted.
• The PDS4 task was initiated to formalize the PDS data standards and address the issues.
PDS4 Key Goals
• Enable a stable and usable long-term archive.
• Enable more efficient archive preparation for data providers.
• Enable services for the data consumer to find the specific data they need and provide the formats they require.
PDS4 Design Principles
• The data model:• is defined in a formal language• is independent of implementation• defines a few fundamental data structures that do
not evolve over time• is extensible enabling it to handle more complex
data formats
• The archive data formats is being designed independent of data provider and data consumer formats.
• The data architecture shall include a standard data dictionary model.
PDS 2010 Architecture
8
PDS4 Data Product Model Components
Registry Object &Web Resource
Product
Description Combinations
Data Object Description
Structure
Data
Classification
Data Product Concept Map
10
Web & Registry View
Basic I/O View
Programmer View
User/Designers View
The Model Design Process
• Master model constrains data design and defines validation.
•Master model and data dictionary are tightly coupled.
•Document writer translates modeling information into target languages based on grammar and munging rules.
•Updates to master model are reflectedquickly in the documents.
PDS4 Data Standards Deliverables• PDS4 Information Model
• The Information Model defines PDS object classes. This includes data structures, formats, and products as well as data sets, documents, missions.
• PDS4 Data Dictionary Model• The Data Dictionary Model provides the schema for the PDS data
dictionary. The data dictionary documents the data elements used in the PDS4 Information Model.
• PDS Standards Reference V4.0• The PDS Standards Reference V4.0 will be written in the format
and tone of a standards reference document.
• Grammar Options• The Grammar is used to capture PDS archive metadata for product
labels.
12
Benefits of the PDS4 Data Model
• The data model is managed in a data modeling tool.• The model is formally defined.• The model can be validated and tested.
• Defines a few simple fundamental data structures.• Fundamental data structures may be extended and
combined to form more complex data formats• The overall architecture is model driven.
• Disentangles the model from its implementation.• Model can evolve over time as research domain changes.• Drives the generation of documentation, label schema, and
other model dependent artifacts.• The data dictionary uses a standard data dictionary
model.
Backup
14
PDS4 Product Label Creation
Label
15
Model
Generic Label
Schema
Design
PopulateExtract
SpecificLabel
Schema
Mission Specific
Data DictionaryIngest/
Update
IngestLabel
Product Labels
ModelingTool A
Design
Data Engineer
ModelingTool B
Load
Edit
Validate
Generate
Current using editorCurrent using Oxygen/Future Design ToolProposed
Ingest
DDWG
Topics
• The View from the Field• What are other disciplines and agencies
doing for preservation/stewardship?• How do they deal with databases,
collections of files, physical objects, ad-hoc services such as work flows?
• How do they deal with provenance?• Any lessons to be learned and incorporated
into earth science practice
Copyright 2009 California Institute of Technology
Government sponsorship acknowledged
PDS4 Data Design Accomplishments - General
• Project and Project Members Defined • Principles and Drivers updated• General Data Model (Draft)• Product Data Model (Draft)• Data Dictionary Model (Draft)• Grammar Options• PDS Standards Reference (Outline)• Data Dictionary Model (Final)• Grammar Decision • PDS Standards Reference (Draft)• PDS and community wide review• General Data Model (Final)• Product Data Model (Final)• PDS Standards Reference (Final)
17
Done
SubstantialProgress
Next 3 Months
Due 9/31/10
Proposed IPDA Data Standards Project
18
• Identify the core elements of the PDS4 data standards
• Develop a process for maintaining alignment between the IPDA and the PDS
UniqueNamespace
Charter for the IPDA Data Architecture Standards
19
“The data standards within the IPDA, including the data models and derived dictionaries, are based on the NASA Planetary Data System (PDS) standard that is the de-facto standard for all planetary data at the time of the IPDA founding”. Charter of the International Planetary Data Alliance, 3rd Draft, May 24, 2007
PDS4 Data Design Accomplishments - Details
20
• Four basic data structures•Homogeneous N-dimensional array of scalars – Array_Base•Heterogeneous repeating record structure of scalars – Table_Base•Unencoded byte stream•Encoded byte stream
• Identifiable• Digital Product
• Data Product• Product_Image_Grayscale• …Image_3D• …Movie• …Table_Character• …Table_Binary• …Table_Binary_Grouped
• Document Set• Software_Set
• Non-Digital Product• Mission• Instrument• Resource
• Data Type• Binary Data Type
• Decimal Integer• …
• Character Literals• Character Integer• …
• Array_Base• Array_2D
• Image_Grayscale• Spectrum_2D
• Array_3D• Image_3D• Movie
• Table_Base• Table_Character• Table_Binary• Table_Binary_Grouped
• Draft PDS Data Dictionary Model
• Grammar – ODL+, PVL, and XML Labels
Example Results
Image _Grayscale
Concept Map
UML Class Diagram
XML Schema
PVL Label Template
Class Definition Table
<!-- PDS4 XML/Schema for Product_Image_Grayscale --><xs:complexType name="Image_Grayscale_Type"> <xs:sequence> <xs:element name="axes_order" type="axes_order_Type"
/* ******* Label Template - Product_Image_Grayscale Object = Product_Image_Grayscale; Object = Image_Grayscale; local_identifier = ${local_identifier};
PDS4 Product Label Creation
Label
22
Model
Generic Label
Schema
Design
PopulateExtract
SpecificLabel
Schema
Mission Specific
Data DictionaryIngest/
Update
IngestLabel
Product Labels
ModelingTool A
Design
Data Engineer
ModelingTool B
Load
Edit
Validate
Generate
Current using editorCurrent using Oxygen/Future Design ToolProposed
Ingest
DDWG
Query Model
• A formal model consisting of classes, attributes, and relations that are appropriate for use as search constraints
• Types of query models include• Data Set• Product
• Data Product• Document Product• Software Product
• Any PDS4 Identifiable • The query models are subsets of the archive model
• Can be augmented with external metadata (e.g. LDD)• Example query constraints
• general parameters - time, target, mission, instrument host, instrument
• any metadata defined in the archive model• any association between two classes
• e.g. documents associated with data sets• class type and hierarchy• geometry - (lat, lon), (az, el), (ra, dec), (x,y,z).