16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 1
IVOA Data Access Layer (DAL)Working Group
Doug TodyNational Radio Astronomy Observatory
International VIRTUAL OBSERVATORY ALLIANCE
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 2
IVOA Data Access Layer (DAL)
• DAL Working Group Priorities– Update simple image access (SIA) to V1.1– Introduce simple spectral access (SSA) V1.0– Introduce web services versions of DAL
services– Drive VO technology development as required
for DAL (e.g., dataset identifiers, data models, VOTable)
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 3
Simple Spectral Access (SSA)
• Goals– provide uniform access to both 1D spectra and SEDs– simplify interface for both data providers and client
applications– powerful "multiwavelength" spectral analysis
capability• Spectral survey
– use-cases to drive interface design– identify early data providers and application
developers• Current issues
– spectral data model– interface design issues– spectral dataset representation
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 4
Cambridge SIA V1.1 Priorities
• Essential– Registry integration– Pixflags support for lossy compressed data (e.g., HCOMPRESS)
• Image Characterization– Image provenance and identification (collection ID, dataset ID,
virtual data provenance, replica support)– Spectral bandpass (already present; may need tweaking for
consistency)– Time of observation– Spatial resolution– Limiting flux (harder; may not make V1.1)
• Other– VO technology integration (normalize UCDs, data models, etc.)– Use of image attributes to refine query (e.g., band)– Default for case where there are multiple versions of same dataset– Spatial bandpass - 3– Image type (future- v2)– Logical hierarchies to describe complex metadata (as in IDHA – v2)
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 5
DAL Interface Issues
• Next version of SIA requires progress in the following areas:– dataset identifiers– component data models, dataset
characterization– data model, dataset representation
• These are actually required for all DAL services, not just SIA
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 6
Global Dataset Identifiers
• Required to identify data returned by DAL services– Images: data collection ID, dataset ID– Catalogs: catalog ID, record ID
• Will enable– replica management and selection– virtual data management and
characterization
• Discussion being led by Registries group
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 7
Data Model Representation
• All DAL data access is data model based• Must be able to represent data models
unambiguously in VOTable• VOTable UTYPE proposal to provide
"pointer into data model“
• Discussion will be in VOTable group
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 8
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 9
Agenda for DAL Working GroupStrasbourg, October 2003
• DAL Recap– Service class hierarchy– Concept of different views of same data
• SIA V1.1 / DAL Interface Issues– image identifiers, virtual data– component data models (UTYPE,
UCD normalization)– getImage acref templating (Francois)
• SSA Straw man– SSA overview / interface (Doug)– SED introduction (Markus) – 1D spectral data model (Jonathan)– discussion of SSA issues
• Review process for development of SSA specification • Update DAL Priorities and Schedule
After a brief review of the services architecture, most of the discussion in this WG meeting focused on enhancement to SIA and the general DAL infrastructure, and the scope and design of the simple spectra access (SSA) interface.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 10
DAL Scope: Types of Data (Cambridge)
Dataset
Time Series
Catalog Source Catalog
Event List
Visibility Data
Image NDImage
1D Spectrum
SED
Primary DAL Services
Concept of DAL service architecture from Cambridge. Reviewed and reaffirmed without objection.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 11
SIA V1.1 / DAL Infrastructure
• Some key issues– Registry integration, service metadata– Image identifiers (data collection ID, dataset ID)– Data characterization (coverage, bandpass, resolution,
etc.)– Data provenance, virtual data characterization– Data model representation, UCD normalization– Templating the URL access reference
• For the most part these issues actually affect all DAL/VO data access and are not specific to SIA.
Some hot topics affecting SIA and all DAL services were discussed.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 12
• Globally unique dataset (resource) identifier– ivoa://<authority-ID>/<resource-ID>#<dataset-ID>
• naming authority (namespace): authority-ID• data collection: resource-ID• dataset or record: dataset-ID
– Images: data collection, dataset– Catalogs: table, record ID
• Key points– data tagged by a unique global identifier– global identifiers may exist independently of any specific registry– identifiers of published data are persistent– authority IDs are globally unique, globally allocated– each authority controls name allocation within their namespace– caveat: this only works in a simple way for physical datasets
Dataset Identifiers
Required for many aspects of data access: publication, data provenance, replication, virtual data.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 13
Replica management
• Data Replication– data replication required for efficient access, data
backup– replica management and selection of datasets is
enabled by dataset IDs
• How it works:– replica manager service can harvest individual
registries and build a replica catalog– query replica manager service to discover replicas– query individual service to confirm existence, get
metadata, get data
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 14
Virtual data
• Data access layer– most data is virtual data derived from external data sources– data access services subset, transform, or otherwise generate
data• Dataset IDs
– will allow data provenance to be specified– dataset A derived from datasets B, C by operation P
• This is an essential step to allow us to describevirtual data, but how we do so? [TBD]
• Current "acref" URL is a kind of virtual data reference– e.g., "http://archive.nrao.edu/sia/nvss?POS=12.32,-
11.2&SIZE=0.1&...”– acref implicitly specifies data provenance– may also be unstable, contain irrelevant access-specific details
• Use of a getData method instead of an explicit acref URL might allow virtual data generation to be standardized for a given access protocol
Dataset identifiers will provide the basis for describing virtual data, but how we do so is still TBD. Most likely doing so will involve defining the generation operation and inputs.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 15
Component data models
• Required in DAL to– characterize complex objects– represent data for transport and analysis
• Modeling complex objects– Standard and custom component data
models are aggregated to model more complex objects, e.g., datasets
Providing a means to determine the ‘quality’ of data will be essential to enable automated data analysis via the VO. Dataset characterization via component data models will provide this capability.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 16
Sample Component Data Models
• Observation metadata– observatory, instrument, project, observer, etc.
• Standard 'coverage' metadata– sky, time, bandpass, etc.
• Dataset characterization– time of observation (lo, high, refvalue)– spectral bandpass (lo, high, refvalue, ID)– spatial bandpass (lo, high; resolution?)– sensitivity or limiting flux (flux 'bandpass'?)– observable
• World coordinate systems• Storage models
The data models WG is actively working to define
these component data models.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 17
UTYPE, UCD normalization
• Proposed VOTable FIELD, PARAM, GROUP Attributes:
name="application-name" -- A name freely defined by an applicationid="ID-name" -- An XML identifier unique within a documentref="ID-ref" -- Reference to an ID elsewhere in documentucd="ucd-name" -- The Unified Content Descriptor ("fuzzy")utype="ns:datamodel-name" -- The uniform attribute type related to a
data-model; "ns" represents an optional
namespace attribute.
• A possible alternative would be to use a namespace within UCD, but this would overload UCD and interfere with its current usage.
UTYPE or something like it is required to represent data models rigorously in VOTable for data analysis in the VO.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 18
Sample Data Model: Spectral Bandpass
UTYPE UCD Name ID
ID INST_FILTER_CODE user-defined none
Unit UNITS user-defined none
RefValue INST_FILTER_REF user-defined none
HiLimit INST_FILTER_MAX user-defined none
LoLimit INST_FILTER_MIN user-defined none
Response DATA_LINK user-defined none
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 19
Access reference templating
• Motivation– SIA query response table can get very large if there are a matrix of
options for each possible output image.
• Some Possible Solutions– ACREF template– getData method
• Thoughts– templating the acref is a form of getData method– should we just add a getData method instead?– but what is settable may be dataset dependent– metadata can flag attributes which can be set in template– acref would be template string– hence can collapse what could be P1*P2*PN redundant entries
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 20
Access reference templating
• Motivation– SIA query response table can get large if there are a matrix of
options for each possible output image.– Should be easier to recognize simple variations on the same
image.– A simple one step ‘getImage’ method could be useful.
• Proposals– Parameter substitution on acref template (F. Bonarrel)
• e.g., image format, compression, image generation parameters– Formal getData method
No clear consensus at this point. Further discussion is needed.
Some form of templating could be good so long as it does not complicate the interface for the client. Some felt that the current approach is ok. XPATH or similar technology should be investigated for implementation.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 21
Simple Spectral Access (SSA)
• SSA Overview (Doug)– Goals, Interface, Data Formats
• SED Introduction (Markus) • Spectral data model (Jonathan)• SSA Issues (all)
General Agreements on SSA– Provide uniform interface for both 1D spectra and SEDs– Develop uniform data model for both 1D spectra and SEDs– Service interface will be similar to SIA, CS (query/response, getData)– Data output formats will include at least text, VOTable, FITS, graphics
SSA will provide an opportunity to learn how to 1) map VO data models into multiple external representations, and 2) package actual datasets in XML/VOTable, including representing data models in XML.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 22
SSA Interface Issues
• Registry integration• Query• Query response• Dataset retrieval• Data model• Data representation
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 23
SSA Interface Issues
• Registry integration– Service metadata query– SSA service metadata
• SSA service verifier– Verify service is correct– Read service metadata, enter into a registry
Agreed without objection. Service verification and registration of service metadata should be provided for all DAL services.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 24
SSA Interface Issues
• Query– Query attributes
• pos, size, spectral resolution, bandpass, time• velocity, redshift, spectral class, object name, etc.• Spatial resolution,• Other?
– Query interface• Simple keyword queries (now)• Query language (ADQL) queries (later)
General agreement that the query is an important aspect of SSA. Spectra are generally more highly processed than, e.g., images, and may have attributes such as velocity, redshift, etc., which one would like to query on.
Implementation of a general query mechanism for SSA may require something like ADQL.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 25
SSA Interface Issues
• Query Response– form VOTable as in SIA– this is a flat summary table for simplicity– alternative would be sequence of structured
objects
Not discussed due to lack of time.
Unless a reason is found to deviate the expectation is that the query response will be a flat VOTable as with SIA.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 26
SSA Interface Issues
• Dataset Retrieval– one getData method per spectrum/SED– data format options
text, xml, votable, fits, graphics, html, ...
Agreed that spectra output formats should include at least text, VOTable, FITS, and graphics. How data is represented in each format is a different issue.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 27
SSA Interface Issues
• Data model– Uniform model for 1D spectra and SED– As simple as it can be while solving this problem– Range of observables– …
The general spectral data model as presented by Jonathan was well received by the WG and will serve as the basis for further development of SSA via a subgroup with members from both DAL and DM.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 28
SSA Interface Issues
• Dataset representation– text (# keyword = value, records)– votable (how do we represent datset in
XML?)– fits (table or image?)
Most of the discussion here was of what FITS format to use. It was agreed that a FITS table was the most general, but would be harder for existing applications to use and would duplicate what VOTable will already provide. Use of a simple linearized 1D spectrum represented as a FITS image will be investigated.
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 29
Process
• Process for development of SSA spec– Spectral survey– Discuss SSA design issues (this meeting)– Initial draft specification– Discuss, revise draft specification– Initial implementations
16-17 Oct 2003IVOA Data Access Layer, Strasbourg 2003 30
Priorities and Schedule
• SSA V1.0– Initial specification– Initial implementations
• DAL Technology– Component data models– Data model representation
• SIA V1.1• Web service implementations