page 1 object oriented data technology for space science data archiving and retrieval: potential...
TRANSCRIPT
Page 1
Page 1
Object Oriented Data Technology for Space Science Data Archiving and Retrieval: Potential Applications in Biomedical Research
March 21, 2001
Dan CrichtonSteve Hughes
Jet Propulsion LabsCalifornia Institute of Technology
National Aeronautics and Space Administration
Page 2
Page 2
JPL is a federally funded research and development center (FFRDC) run by Caltech for NASA
JPL is NASA’s lead center for robotic exploration of the universe
JPL has an enormous amount of data that it needs to manage from scientific data, to engineering, to institutional
We represent several efforts in both the research and enterprise side of JPL that is addressing enterprise architectures for integrating data at both JPL and NASA. Such efforts include
Planetary Data System - NASA Planetary Archive
Enterprise Data Architecture and Applications
JPL IT Data Infrastructure
Knowledge Management - JPL/NASA KM Initiative
Object Oriented Data Technology - I.T. Data System Research
Who are we?
Page 3
Page 3
Challenges to Data ManagementArchiving, Search, Retrieval and Integration
Space scientists cannot easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community.
Heterogeneous Systems Data Management - RDBMS, ODBMS, HomeGrownDBMS, BinaryFiles Platforms - UNIX, LINUX, WIN3.x/9x/NT, Mac, VMS, … Interfaces - Web, Windows, Command Line Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII, ... Data Volume - KiloBytes to TeraBytes
Heterogeneous Disciplines Moving targets and stationary targets Multiple coordinate systems Multiple data object types (images, cubes, time series, spectrum, tables,
binary, document) Multiple interpretations of single object types Multiple software solutions to same problem Incompatible and/or missing metadata
Page 4
Page 4
Evolution of Data Systems(Trying to make order out of entropy)
Data System Maturity
Basic Data Infrastructure - Data Acquisition - Databases - Data Analysis Tools - Homogeneous Computing
Data Archiving
- Catalog Systems - Data sets - Data Products - Metadata
Data Location
- Metadata - Distributed Data sets - Distributed Services
Data Product Exchange/Interoperability - Heterogeneous Servers - Data Interchange - Data Sharing - Distributed Architectures
Centralized Data Distributed Data
Page 5
Page 5
Independent Access to PDS Archives
MARIE EDR,PDS PPI
GRS RDRPDS GEO
THEMIS EDRASU
DOCUMENTS& ANCILLARYFILES
GRS EDRUofA
MGSMOLA
MGSTES
MGSMOC
Viking MDIM
VikingEDR
Mars Odyssey
Mars GlobalSurveyor Viking
AncillaryFiles
MOC Interface
TES Interface
MOLAInterface
Viking Interface
Odyssey Interfaces
Page 6
Page 6
PDS Nodes and Institutions (Silos)
NAIF/JPL
Small Bodies/UMD
Atmospheres/New Mexico State
Geosciences/Washington University
Planetary Plasma/UCLA
Rings/Ames
Radio Science/Stanford
Central Node/JPL
Imaging/USGS
Imaging/JPL
Page 7
Page 7
Managing Software Interfaces(# of Systems vs # of Interfaces)
N Interfaces
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
10 20 50 100 200 300
N Interfaces
# of interfaces = (n2 - n) / 2 where the number of systems is n
# of systems
# of interfaces
Page 8
Page 8
DIS Approach (circa 1998)
• PDS Distributed Inventory System (DIS)
• Identified all resources available across the PDS• Used Object Description Language (keyword/value)
labels to describe data and non-data resources• Included URL/FTP links to all online resources• Used cgi-script engine to search label database• Displayed results in an HTML page
Page 9
Page 9
Product Label
OBJECT = DATA_SET ; DATA_SET_NAME = VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL; DATA_SET_ID = VO1/VO2-M-VIS-5-DIM-V1.0; DATA_SET_DESC = http://pds.jpl.nasa.gov/cgi-bin/remote.pl?ds:VO1/VO2%2BMARS%…; DATA_SET_TERSE_DESC = http://pds.jpl.nasa.gov/cgi-bin/remote.pl?ds:VO1/VO2%2BM…; DATA_SET_RELEASE_DATE = 1991; LINK = http://www-pdsimage.wr.usgs.gov/PDS/public/mapmaker/mapmkr.htm; ARCHIVE_STATUS = ARCHIVED; ARCHIVE_STATUS_NOTE = Passed peer review with all liens resolved. Available through PDS and NSSDC; ARCHIVE_SCHEDULE_NOTE = 1999-08-18:ARCHIVED; DATA_OBJECT_TYPE = IMAGE; START_TIME = N/A; STOP_TIME = N/A; NODE_NAME = IMAGING; CURATING_NODE_ID = IMAGING; DATA_ENGINEER_ID = N/A; TARGET_NAME = MARS; TARGET_TYPE = PLANET; MISSION_NAME = VIKING; INSTRUMENT_HOST_NAME = VIKING ORBITER 1, VIKING ORBITER 2; INSTRUMENT_HOST_TYPE = SPACECRAFT; INSTRUMENT_NAME = VISUAL IMAGING SUBSYSTEM - CAMERA A, VISUAL IMAGING SUBSYSTEM - CAMERA B; INSTRUMENT_TYPE = VIDICON CAMERA; NSSDC_DATA_SET_ID = 75-075A-01F, 75-083A-01C; VOLUME_ID = VO_2001, VO_2002, …; VOLUME_LINK = ftp://pdsimage.wr.usgs.gov/cdroms/vo_2001/, ftp://pdsimage.wr.usgs.gov/cdroms/vo_2002/, … KEYVALUES = CAMERA, DIGITAL, DIM, IMAGING, MARS, MODEL, ORBITER, PLANET, SPACECRAFT, SUBSYSTEM,
TABLE, VIDICON, VIKING, VIS, VISUAL, VO1, VO2, VO_2001, VO_2002, …; LABEL_REVISION_NOTE = 2000-09-19 C:CN S:CN LN:AY:DN;END_OBJECT;
Page 10
Page 10
DIS Links to Mars Archives
MARIE EDR,PDS PPI
GRS RDRPDS GEO
THEMIS EDRASU
DOCUMENTS& ANCILLARYFILES
GRS EDRUofA
MGSMOLA
MGSTES
MGSMOC
Viking MDIM
VikingEDR
Mars Odyssey
Mars GlobalSurveyor
Viking AncillaryFiles
DIS
DataMiningData
Catalogs
Distributed Archives
Page 11
Page 11
Lessons Learned
• Identified metadata as key to the solution• Identifies and describes data and resources• Provides search attributes• Helps determine whether query can be resolved by
resource
• However• ODL is not a universal interchange language• Resource heterogeneity is still visible• Navigation and access is limited to http links• System interoperability is not supported
Page 12
Page 12
OODT, The Next Generation XML/JAVA/CORBA
• Started in 1998 as a research and development task funded at JPL by the Office of Space Science to address
• Application of Information Technology to Space Science• Research methods for interoperability, knowledge
management and knowledge discovery• Develop software frameworks for data management to reuse
software, manage risk, reduce cost and leverage IT experience
• OODT Initial focus• Data archiving – Manage heterogeneous data products and
resources in a distributed, metadata-driven environment• Data location – Locate data products across multiple
archives, catalogs and data systems• Data retrieval – Retrieve diverse data products from
distributed data sources and integrate
Page 13
Page 13
OODT System Design Goals
Encapsulate individual data systems to hide uniqueness
Provide data system location independence Require that communication between distributed
systems use metadata Define a standard data dictionary structure and
approach for describing systems and resources Provide a scalable and extensible solution Provide a mechanism for data product exchange Allow systems using different data dictionaries and
metadata implementations to be integrated Define an architecture that can leverage off of open
standard approaches
Page 14
Page 14
OODT Technology Focus
Focus on building middleware components Focus on creating metadata “profiles” about data system resources Provide sufficient layers of abstraction in the architecture to isolate
technology choices from architecture choices XML (Extensible Markup Language) for the data content CORBA (Common Object Request Broker Architecture) for the data
transport Research technologies for implementing a distributed data architecture
Distributed Object Computing (CORBA, DCOM, etc) Database Technology (RDBMS, ODBMS) Data Access Technologies (O/JDBC, XML, etc) Directory Implementations (LDAP) Data Interchange (XML) Communication Technologies (Web/HTTP, MOM, RPC, etc)
Page 15
Page 15
Motivation for Middleware
Middleware allows for the encapsulation of individual data systems
Hide uniqueness by introducing the data architecture layer
Allows for data abstraction Provide common client interfaces to heterogeneous systems Manage risk associated with technical decisions. Systems evolve
independent of the clients.
Enable reuse and promote standards
Allow for incompatible systems to be tied together by introducing a middleware layer
Page 16
Page 16
Middleware Data Encapsulation
Middleware can tie application, data, and user interfaces together and hide the unique interfaces
Applications User Interface
Data
Middleware
Page 17
Page 17
OODT Component Framework
Java based software middleware component architecture that provides a software framework for archiving, search and retrieval, and data product exchange
Archive Component Provides centralized data archiving and cataloging of data products Distributed
A Search and Retrieval Component Manage metadata associated with resources Locate resources across geographically distributed data systems Distributed
Data Product Exchange Component Support interchange (data sharing) of data products Support heterogeneous implementations and systems Distributed
Query Service Component Ties search and product exchange services together Distributed
Page 18
Page 18
Component Framework for OODT
OODT/Science Web Tools
ArchiveClient
OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK
ProfileXMLData
NavigationService
Data System
2
Data System
2
Other Service 1
Other Service 2
QueryService
ProductService
ProfileService
ArchiveService
Bridge to External Services
Page 19
Page 19
Data Archiving Goals
Provide basic functions Transport and management of data sets and products Identification of products using metadata Event driven processing associated with data sets Ability to add, get and delete products from the archive Management of large, complex datasets and products
Provide extensible data management approach Database is dynamically generated and extended based on
metadata content Separate the metadata and the archive. Metadata can be exported
to other components of the OODT architecture.
Provide a n-tier, client/server architecture for archiving at remote locations
Build a service that is accessible via clients using common programming languages (Java, C++, etc)
Page 20
Page 20
Archive Management
Page 21
Page 21
OODT Metadata Development
Develop methods for managing the semantics of data that are shared within and between domains
Terminology Base – Domain specific name space Data Dictionary – Inventory of domain terms with definitions and
other distinguishing attributes. Ontology – A set of concepts, their relationships and constraints, all
within the scope of a domain. XML for metadata registry and communication
Several I.T. efforts have shown the criticality of metadata in enabling data sharing and system interoperability
Use standards where appropriate ISO/IEC 11179 – A framework for the Specification and
Standardization of Data Elements Dublin Core – A metadata element set intended to facilitate discovery
of electronic resources.
Page 22
Page 22
Why XML?
XML doesn’t provide a “silver bullet,” but it does allow us to refocus the problem on metadata Metadata is a key to interoperability (http://www.cio.gov/docs/metadata.htm)
XML is language neutral
Allows the designer to separate the data and the transport (re: CORBA vs XML-over-CORBA) Transport mechanism and data are not tied together
Could be XML/HTTP
Simpler deployments
Simpler interfaces
Allows technologies to grow and change independently
Real value of XML is the process of describing the data
Page 23
Page 23
What is a profile?
A profile is a set of metadata resource definitions implemented in XML for data products residing in one or more distributed systems
Profile servers are CORBA servers that manage XML profile definitions
Profile servers communicate via XML-over-CORBA
Developed Java classes that map XML profiles to a Java object
Profile Distributed Node Architecture
XML/CORBA XML/CORBA
XML/CORBA XML/CORBA
XML/CORBAXML/CORBA
Page 24
Page 24
Solutions to Data Search
Build metadata “profiles” that describe data system resources Encapsulate individual data systems resources (Hide uniqueness)
Communicate using metadata (Provide metadata with data)
Enable interoperability based on metadata compatibility
Refocus problem on metadata development
Provide a core framework of software components to interconnect distributed data systems
Define profiles using standard industry approaches Use XML to describe profiles ISO/IEC 11179 – A framework for the Specification and Standardization
of Data Elements Dublin Core – A metadata element set intended to facilitate discovery of
electronic resources.
Page 25
Page 25
Resource Profile Classifications
profileserver
granule
datasetanalysistool
productserver
catalog
interface
datasetcollection
datadictionary
SpaceScience
Astrophysics SpacePhysics
PlanetaryScience
NationalInstitute ofHealth
EarlyDetectionNetwork
NationalCancerInstitute
Resource Classes
MetadataDataApplicationSystem
Resource Context(Discipline )
Medicine Space Science
Page 26
Page 26
Profile DTD
<!ELEMENT profiles (profile+)>
<!ELEMENT profile (profAttributes, resAttributes, profElement*)>
<!ELEMENT profAttributes (profId, profVersion*, profTitle*, profDesc*, profType*, profStatusId*, profSecurityType*, profParentId*, profChildId*, profRegAuthority*, profRevisionNote*, profDataDictId*)>
<!ELEMENT resAttributes (Identifier, Title*, Format*, Description*, Creator*, Subject*, Publisher*, Contributor*, Date*, Type*, Source*, Language*, Relation*, Coverage*, Rights*, resContext*, resAggregation*, resClass*, resLocation*)>
<!ELEMENT profElement (elemId*, elemName, elemDesc*, elemType*, elemUnit*, elemEnumFlag*, (elemValue | (elemMinValue, elemMaxValue))*, elemSynonym*, elemObligation*, elemMaxOccurrence*, elemComment*)>
Page 27
Page 27
XML Profile Example (1 of 2)
<profile> <profAttributes> <profId>OODT_PDS_DATA_SET_INV_82</profId><profDataDictId>OODT_PDS_DATA_SET_DD_V1.0</profDataDictId> </profAttributes> <resAttributes> <Identifier>VO1/VO2-M-VIS-5-DIM-V1.0</Identifier> <Title>VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL …</Title> <Format>text/html</Format> <Language>en</Language> <resContext>PDS</resContext> <resAggregation>dataSet</resAggregation> <resClass>data.dataSet</resClass> <resLocation>http://pds.jpl.nasa.gov/cgi-bin/pdsserv.pl?…</resLocation> </resAttributes>
Page 28
Page 28
XML Profile Example (2 of 2)
<profElement> <elemId>ARCHIVE_STATUS</elemId> <elemName>ARCHIVE_STATUS</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>ARCHIVED</elemValue> </profElement> <profElement> <elemId>TARGET_NAME</elemId> <elemName>TARGET_NAME</elemName> <elemType>ENUMERATION</elemType> <elemEnumFlag>T</elemEnumFlag> <elemValue>MARS</elemValue> </profElement></profile>
Page 29
Page 29
OODT Profile Service Component
Profiles are managed by profile servers
Profile servers are written in Java
OODT currently has three different registry methods for managing profiles which are configurable at run time Flat File
RDBMS via JDBC (Oracle)
LDAP (OpenLDAP)
Page 30
Page 30
Data Product Exchange
Exchanging products requires access to each data system (RDBMS/OODBMS, Flat file, etc) which is difficult Different vendor products
Non-standard interfaces
Different implementations (data model, home grown, COTS,etc)
Representations of data are different
Heterogeneous Platforms
Heterogeneous O/S
etc
Page 31
Page 31
Solutions to Data Product Exchange
Extend framework to support common access to distributed data systems by creating a “Product Service Component”
Product Servers - Middleware that negotiates the interfaces between the data system implementations
Design the component to leverage off of Consistent metadata and data dictionary Consistent data interchange methods and protocols
Provide data abstraction Data and information hiding Location hiding and independence
Provide a standard language for communication Use the OODT XML Query language for data interchange
Support “rich” query description including data elements and constraints Support “rich” query results that include results in many different formats
Page 32
Page 32
OODT Product Server Component
The Product Server plugs into the OODT framework and manages the “handshake” between the data system and the OODT system.
Extensible by dynamically loading objects at runtime which are specific to the data system model
Queries and results are passed using an OODT XML Query structure Encapsulates one or more data sources for standardized access
Product Server
Flat file accessor
Other accessor
Query: XML
Result: XML
Generic Server Implementation Class
FileSys
Database
Page 33
Page 33
OODT Query Service Component
Manages all queries for the identification and retrieval of data products
All components are identified by a unique name and managed in a CORBA name server
Queries to multiple profile or product servers occur concurrently
Queries are described using the OODT XML Query structure
Ties together the profile and product server components for the OODT framework
Page 34
Page 34
XML Query Example (1 of 2)
<query> <queryAttributes> <queryId>OODT_XML_QUERY_V0.1</queryId> <queryTitle>OODT_XML_QUERY - PDS DIS Query Example</queryTitle> <queryDesc>PDS DIS Query for TARGET_NAME = MARS</queryDesc> <queryType>QUERY</queryType> <queryStatusId>ACTIVE</queryStatusId> <querySecurityType>UNKNOWN</querySecurityType> <queryRevisionNote>2000-05-12 JSH V1.2 Updated for new prof.dtd</queryRevisionNote> <queryDataDictId>OODT_PDS_DATA_SET_DD_V1.0</queryDataDictId> </queryAttributes> <queryResultModeId>ATTRIBUTE</queryResultModeId> <queryPropogationType>BROADCAST</queryPropogationType> <queryPropogationLevels>N/A</queryPropogationLevels> <queryMaxResults>100</queryMaxResults><queryResults>0</queryResults> <queryKWQString>TARGET_NAME = MARS</queryKWQString>
Page 35
Page 35
XML Query Example (2 of 2)
<querySelectSet></querySelectSet> <queryFromSet></queryFromSet> <queryWhereSet> <queryElement> <tokenRole>elemName</tokenRole> <tokenValue>TARGET_NAME</tokenValue> </queryElement> <queryElement> <tokenRole>LITERAL</tokenRole> <tokenValue>MARS</tokenValue> </queryElement> <queryElement> <tokenRole>RELOP</tokenRole> <tokenValue>EQ</tokenValue> </queryElement> </queryWhereSet> <queryResultSet></queryResultSet></query>
Page 36
Page 36
Insertion of OODT into PDS
MARIE EDR,PDS PPI
GRS RDRPDS GEO
THEMIS EDRASU
DOCUMENTS& ANCILLARYFILES
GRS EDRUofA
MGSMOLA
MGSTES
MGSMOC
Viking MDIM
VikingEDR
BrowserInterfacesBrowser
Interfaces
DataVisualizationData Mining &
Visualization
DataMiningData
Catalogs
OODT MiddlewareDataMiningProduct
Servers
DataMiningProduct
Servers
Distributed Archives
Page 37
Page 37
OODT Query Flow Example
Query Server
Profile Serverjpl
Product Serverjpl.pti
Product Serverjpl.pds
Product Serverjpl.pds.mola
Userquery
XSL(profiles ordata productsformatted)
XMLQuery/IIOP(no results)
XMLQuery/IIOP(profiles ordata resultsas requested)
XMLQuery/IIOP(no results)
XMLQuery/IIOP(profiles of resources to handle query)
XM
LQ
ue
ry/I
IOP
(d
ata
re
sults
)
XM
LQ
ue
ry/I
IOP
(p
rod
uct
se
arc
h)
Search Web Page
Profile DB
PTI Repository
PDS DVD Jukebox
PDS MOLA Oracle DB
Que
ryC
lien
t
Web
se
rver
sear
ch.js
p
Page 38
Page 38
MGS Coverage Plot
Page 39
Page 39
Query Results
Page 40
Page 40
Efforts in Bioinformatics
September 2000, the Office of Science Policy at NIH and JPL entered into a joint inter-agency agreement to
Explore knowledge environments to support the management and exchange of biomarkers and biomakers databases
Insert technologies developed to support interoperability between biomedical databases
Develop methodologies to are mutually beneficial to both biomedical and space science data systems
A series of meetings lead to an initial effort that focused on interoperability within the Early Detection Research Network, a network within the National Cancer Institute (NCI) consisting of 18 laboratories and hospitals
Two initial epidemiological centers were selected for locating cancer specimens
Moffitt Cancer Research Center University of San Antonio, Texas
A “mock up” data architecture for EDRN was presented on January 21st at the 3rd EDRN Steering Committee meeting in San Antonio, Texas
Page 41
Page 41
EDRN Mock Query Interface
Page 42
Page 42
QueryManager
EDRN Knowledge ArchitectureMockup Implementation at JPL
San Antonio
Moffitt
MetadataProfiles
EDRN Mock Databases Hosted at JPL
San Antonio ProductExchange Server
Moffitt ProductExchange Server
In:QueryOut::Identified Resources
In:QueryOut::Data Products
In:QueryOut::Data Products
OODTMiddleware: Hosted at JPL
EDRN “Mock” Query Interface
In:QueryOut::Data Products
Page 43
Page 43
Component Based Mars Distributed Archive Architecture 2001 Mars Odyssey Orbiter Example
VolumeServer
Data ProductServer(s)
PI TeamClient Interface
QueryManager
MARIE EDR,RDR ArchivePDS PPI
GRS RDRArchivePDS GEO
THEMIS EDR,RDR ArchiveASU
Documents andAncillary FilesPDS CN
GRS EDRArchiveUofA
Code-S ScientistsClient Interface
General PublicClient Interface
In:QueryOut::Data Products,Volumes
In:QueryOut::Identified Resources
In:QueryOut::Data Products Data Products
Virtual Volume
OODTMiddleware Virtual Volume
Physical Volume
Data Productsand Descriptions
Data Productand ResourceDescriptions
DVDVolume
SupportingDocuments
Data Productsand Descriptions
Data Productsand Descriptions
Data Productsand Descriptions
In:QueryOut::Data Products,Volumes
In:QueryOut::Data Products,Volumes
Page 44
Page 44
More Information and References
OODT Papers (http://oodt.jpl.nasa.gov/doc/papers) “Science Search and Retrieval using XML” by OODT Team. Presented at Second National Conference
on Scientific and Technical Data, National Academy of Sciences, Washington D.C. March 2000. “A Distributed Component Framework for Science Data Product Interoperability” by OODT Team.
Presented at the 17th Annual International CODATA conference. Baveno, Italy. October 2000.
OODT Presentations (http://oodt.jpl.nasa.gov/doc/presentations)
JPL Enterprise Data Architecture White Paper (e-mail: [email protected])
Planetary Data System
http://pds.jpl.nasa.gov
Dublin Core
http://purl.oclc.org/dc
Extensible Markup Language
http://www.w3c.org/XML
ISO/IEC 11179: Specification and Standardization of Data Elements
Federal CIO Statement on Metadata
http://www.cio.gov/docs/metadata.htm
Page 45
Page 45
Backup Slides
Page 46
Page 46
OODT Prototype Environment
- XML parser: Apache/IBM Xerces 1.0.3
http://xml.apache.org
- XSLT: Apache Xalan 1.0.0
http://xml.apache.org
- CORBA: Orbacus 4.0.3
http://www.ooc.com
- Database: Oracle 8.1.5
http://www.oracle.com
- LDAP server: OpenLDAP 1.2.11
http://www.openldap.org
- Development language: Java 1.2
http://java.sun.com
- Web server: iPlanet Fasttrack 4.1
http://www.iplanet.com
- Server operating system: RedHat Linux 6.2
http://www.redhat.com
- Version control system: CVS 1.10.5
http://www.cvshome.org
Page 47
Page 47
Profile Server Node
Page 48
Page 48
The Use of Metadata in the PDS
Mission
Data Set Collection
Data SetInstrument
Spacecraft
Target
Document Image Time Series Spectrum
Planetary ScienceModel
Locate and Use Data- Use context to find data- Use context to understand data
System Interoperability- Use context to share data
Correlative Science- Use context to find new relationships between data
ModelAttributes
Label
Data