nvo summer school, aspen 9-sep-20051 data access layer doug tody (nrao) us n ational v irtual o...
TRANSCRIPT
NVO Summer School, Aspen 9-Sep-2005 1
Data Access LayerDoug Tody (NRAO)
US NATIONAL VIRTUAL OBSERVATORY
NVO Summer School, Aspen 9-Sep-2005 2
Data Access Layer
• What does it do?– Provides access to data
• data discovery• mediation to a standard model• data retrieval• on-demand data generation• server-side computation (subsetting, filtering)
• What is it for?– Supports client data analysis
• distributed, multiwavelength
• How does it work?– Object (dataset) oriented
• catalog, image, spectrum, time series, SED, etc.– Services
• cone search (also SkyNode), SIA, SSA
NVO Summer School, Aspen 9-Sep-2005 3
Cone Search
NVO Summer School, Aspen 9-Sep-2005 4
Cone Search
• Provides basic catalog access– Query by position and aperture (cone in space)– Query consists of base-URL (service endpoint) plus parameters
• e.g., http://base-url %RA=12.0&DEC=0.0&SR=1.0– Catalog returned as a VOTable
• Advantages– Simple but powerful, provides standard interface– Easy to implement and use
• Limitations– Catalog metadata is not defined– No data model support
• Future– Supplanted by basic SkyNode (Greene, Saturday)– Supports metadata discovery, SQL-like syntactical queries– We will continue to support the basic cone search query
however!
NVO Summer School, Aspen 9-Sep-2005 5
Simple Image Access
NVO Summer School, Aspen 9-Sep-2005 6
Simple Image Access (SIA)
• Basic Usage, Highest Level– Client queries Registry to find interesting
services– Each service is queried (in turn or
simultaneously) for data– Client collates and analyzes results– Selected datasets are retrieved
NVO Summer School, Aspen 9-Sep-2005 7
Simple Image Access (SIA)
• Basic Usage, Single Service– Query
• find data of interest from a single service• http://base-url
%POS=12.0,0.0&SIZE=0.2&FORMAT=image/fits
– Query response• VOTable, one row per candidate dataset• "access reference" (a URL) points to data
– Data selection• Performed by the client using query response metadata
– Dataset retrieval• Retrieve actual datasets, if any
NVO Summer School, Aspen 9-Sep-2005 8
Service Capabilities
• Types of Services– Atlas Precomputed survey image (entire image)– Pointed Image from pointed observation (entire
image)– Cutout Cutout existing image (pixels unchanged)– Mosaic Reprojected image (pixels resampled)
• Virtual Data– Data model mediation– Subsetting, filtering, etc. on the fly– Possible to view same data in different ways
• Interface– RESTful interface currently (HTTP GET)– Document oriented (VOTable, FITS, JPEG, etc.)
NVO Summer School, Aspen 9-Sep-2005 9
Data Model
• SIA data model is the familiar "astronomical image"– Generally this means a 2D sky projection– Data array is logically a regular grid of pixels– Encoded as a FITS image, GIF/JPEG, etc.
• Standardized dataset metadata– Provenance– Image geometry– Scale– Format– Position, WCS– Time of observation– Spectral bandpass– Access information
NVO Summer School, Aspen 9-Sep-2005 10
Input Parameters
• Required parameters– POS center of ROI (ra, dec decimal degrees ICRS)– SIZE width; or width, height– FORMAT ALL, GRAPHIC, image/fits, image/jpeg,
text/html,…
• Optional parameters– INTERSECT values: covers, enclosed, center, overlaps– VERB table verbosity
• Service-defined parameters– used to further refine queries, but not yet standardized
• e.g., BAND, SURVEY, etc.
• Image generation parameters– NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ
• used for cutout/mosaic services to specify image to be generated
NVO Summer School, Aspen 9-Sep-2005 11
Query Response
• Output is a VOTable– Must contain a RESOURCE element with tag="results",
containing the results of the query.
• The ‘results’ resource contains a single table– Each row of the table describes a single data object which can
be retrieved.
• The fields of the table describe the attributes of the dataset– These are the attributes of the SIA data model– In SIA 1.0, the UCD is used to identify the data model attribute
• e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc.
NVO Summer School, Aspen 9-Sep-2005 12
Query Response
• Image metadata– Describes the image object (required)
• Coordinate system metadata– Image WCS
• Spectral bandpass metadata– Prototype data model describing spectral bandpass of image
• Processing metadata– Tells whether the service modified the image data
• Access metadata– Tells client how to access the dataset (required)
• Resource-specific metadata– Additional optional service-defined metadata describing image
NVO Summer School, Aspen 9-Sep-2005 13
Image Metadata
VOX:Image_Title Brief description of image
POS_EQ_RA_MAIN Ra (ICRS)POS_EQ_DEC_MAIN Dec (ICRS)INST_ID Instrument nameVOX:Image_MJDateObs MJD of observationVOX:Image_Naxes Number of image axesVOX:Image_Naxis Length of each axisVOX:Image_Scale Image scale, deg/pixVOX:Image_Format Image file format
NVO Summer School, Aspen 9-Sep-2005 14
NVO Summer School, Aspen 9-Sep-2005 15
Image Retrieval
• Completely optional– Typically only a fraction of the available images are retrieved
• Query response– If an access reference is provided, the data can be retrieved– SIAP can also be used to describe data which is not online– The same data may be available in multiple formats
• Image retrieval– Very simple; access reference is a URL– Standard tools can be used to fetch the data
• (browser, wget, curl, i/o library, etc.)– Data is often computed on-the-fly– All retrieval is synchronous (currently)– No provision for restricting access (currently)
NVO Summer School, Aspen 9-Sep-2005 16
Service Registration
NVO Summer School, Aspen 9-Sep-2005 17
Future Development
• SIA V1.1– Based on work done on SSA– Expanded query interface
• no longer limited to positional queries– Much richer query response
• generic dataset identification, characterization, etc.• metadata extension mechanism
– Selected features• VOTable 1.1 with UCD 1+, GROUP, UTYPE• query response can be ordered by "score"• logical groupings of related query records• compression support
– Versioning• required to make protocol upgrades manageable
NVO Summer School, Aspen 9-Sep-2005 18
NVO Summer School, Aspen 9-Sep-2005 19
NVO Summer School, Aspen 9-Sep-2005 20
Future Development
• Service verification– for testing at development time– when registered; level of compliance metric
• Grid capabilities – Data staging
• asynchronous image generation (long running jobs)• batch generation of images (multiple images)
– Data management• support for single sign-on authentication, authorization• network data caching, third party delivery (VOStore etc.)
– Web service interface• resource metadata• service availability (etc.)
• ADQL integration– Capability to use query language for queries
NVO Summer School, Aspen 9-Sep-2005 21
Simple Spectral Access
NVO Summer School, Aspen 9-Sep-2005 22
Simple Spectral Access (SSA)
• What is it?– Provides access to 1D spectra, time series, SEDs– Tabular spectrophotometric data (photometry points)– Represents second generation, data model-based DAL
interfaces
• Status– Draft V0.9 query interface reviewed in Kyoto (May 05)– Revisions in progress; draft PR targeted for Madrid (Oct 05)– Much work on data models however still being revised– Some initial prototypes already exist (services, client apps)
• IVOA/Madrid discussions will be held immediately after the ADASS and are open to all
NVO Summer School, Aspen 9-Sep-2005 23
Basic Usage
• SSA specification may be complex, but basic usage is simple
• Simple query– POS, SIZE, FORMAT - like cone search, SIA– Possibly refined by spectral or time bandpass, etc.– Most metadata in query response is optional
• Data retrieval– Simple retrieval is again URL-based– Get back a dataset "document" (VOTable, FITS, JPEG, etc.)– In simplest case could be wavelength, flux as text (for Spectrum)– Pass-through of external data is permitted
• Data Analysis– Standard data model isolates application from quirks of– external project data
NVO Summer School, Aspen 9-Sep-2005 24
Concepts - Dataset-oriented
• Data object type– Spectrum, TimeSeries, SED
• Dataset creation type– Atlas Whole datasets, uniform survey data– Pointed Whole datasets, variable instrumental data– Cutout Subset, data samples are not modified– Resampled Subset, data samples computed by service
• Dataset derivation– Observed An observation– Composite Combination of several observations– Simulated Simulated observation made from real data– Synthetic Data from a theoretical model
NVO Summer School, Aspen 9-Sep-2005 25
Data Models
• Data models used in SSA– Spectral data Spectrum, TimeSeries, SED– Dataset Generic dataset descriptor– Target Astronomical target observed– Curation Origin of data– Characterization Physical characteristics of data– Provenance Instrument which generated the data
• User defined data models– Metadata extension mechanisms
• additional data model attributes (table fields)• additional resources in VOTable, linked back to main table
– Provide a mechanism to "subclass" dataset to tailor it for a given data collection
NVO Summer School, Aspen 9-Sep-2005 26
Spectral Data (SED)
spectrum segment
Photometry point
NVO Summer School, Aspen 9-Sep-2005 27
Spectral/SED Data Model
NVO Summer School, Aspen 9-Sep-2005 28
NVO Summer School, Aspen 9-Sep-2005 29
Query Interface
• Mandatory query parameters– POS RA, DEC (ICRS)– SIZE diameter (decimal degrees)– TIME data1,date2 (epoch in
decimal years UTC) – BAND wave1,wave2 (meters in vacuum;
source or observer)– FORMAT VOTable, fits, xml, text, graphics,
html, external
NVO Summer School, Aspen 9-Sep-2005 30
Query Interface
• Recommended query parameters– APERTURE approx spatial resolution
(decimal degrees)– SPECRES spectral resolution (meters)– TOP number of top-ranked
records to return– OBJTYPE mandatory if service returns
multiple object types– COLLECTION data collection identifier
NVO Summer School, Aspen 9-Sep-2005 31
Query Interface
• Optional parameters– CREATORID creator-assigned dataset identifier (at most 1)– PUBID publisher-assigned dataset identifier (at most N)– COMPRESS enable compression (for both data _and_
queries?)
– SNR signal-to-noise ratio– REDSHIFT redshift range (dlambda/lambda)– TARGETCLASS star, galaxy, pulsar, PN, QSO, AGN, etc.
NVO Summer School, Aspen 9-Sep-2005 32
Query Response
• Classes of query metadata– Query metadata Describes the query itself– Dataset metadata Describes data object; object-specific– Target metadata Astronomical target– Curation metadata External identification of dataset– Characterization Coverage, Accuracy, Frame, etc.– Instrument metadata Service-defined; hard to
standardize– Access metadata Describes how to access the
dataset
NVO Summer School, Aspen 9-Sep-2005 33
Query Response
• Query Metadata– Query.Score How well object matches
query– Query.LName Logical name (identifier)– Query.LNameKey Logical name key (id-ref)
• Example: LName="MyObj123" LNameKey="server,format"
NVO Summer School, Aspen 9-Sep-2005 34
Query Response
• Dataset Metadata– Dataset.Type Spectrum, TimeSeries, SED, etc.– Dataset.DataModel DM name, e.g., "SSA-V0.90"– Dataset.Title Brief descriptive title of dataset– Dataset.SSA.NSamples Total samples in dataset
Dataset.SSA.Aperture Characteristic aperture diameter– Dataset.SSA.TimeAxis TimeCoord axis (external data)– .SSA.SpectralAxis SpectralCoord axis (external
data)– Dataset.SSA.FluxAxis Flux axis (external data)– Dataset.CreationType atlas, pointed, cutout,
resampled– Dataset.Derivation observed, composite,
simulated, synthetic
NVO Summer School, Aspen 9-Sep-2005 35
Query Response
• Target Metadata– Target.Name Name of astronomical object– Target.Class Target class (star, galaxy, QSO, etc.)– Target.SpectralClass Spectral class (e.g., 'O', 'B', etc.)– Target.Redshift Nominal redshift for object– Derived.VarAmpl Variability amplitude (fraction 0-
1)– Derived.SNR Observed signal to noise ratio
NVO Summer School, Aspen 9-Sep-2005 36
Query Response
• Curation Metadata– Curation.Collection Data collection name (identifier)– Curation.Creator Creator identify (identifier)– Curation.CreatorID Creator-assigned dataset
identifier– Curation.PublisherID Publisher-assigned dataset
identifier– Curation.Date Dataset creation date (ISO date string)– Curation.Version Dataset version (within same
ID)
NVO Summer School, Aspen 9-Sep-2005 37
Query Response
• Characterization1 - Coverage– .Location.Spatial Position (e.g., RA, DEC)– .Location.Time Observation time characteristic value– .Location.Spectral Spectral bandpass characteristic
value– .Location.Spectral.BandID Bandpass ID (band or filter name)– .Bounds.Spatial Aperture footprint (polygon on sky)– .Bounds.Time Low/High time values– .Bounds.Spectral Low/High spectral values– .Bounds.Flux Limiting flux, saturation limit (Jansky)– .Fill.Spatial Spatial sampling filling factor (0-1)– .Fill.Time Time sampling filling factor (0-1)– .Fill.Spectral Spectral sampling filling factor (0-1)
NVO Summer School, Aspen 9-Sep-2005 38
Query Response
• Characterization2 - Accuracy– Accuracy.*.Calibrated uncalibrated, relative,
absolute– Accuracy.*.Resolution Resolution of measured
signal
– Accuracy.*.StatErr Statistical error (measured)
– Accuracy.*.SysErr Systematic error (estimated)
('*' = Spatial, Time, Spectral, Flux)
NVO Summer School, Aspen 9-Sep-2005 39
Query Response
• Characterization3 - Reference Frames– Frame.Spatial.Type Coordinate frame (default ICRS)– Frame.Spatial.Equinox Coordinate system equinox
(J2000)– Frame.Time.System Timescale (TT)– Frame.Time.SIDim SI factor and dimension– Frame.Spectral.SIDim SI factor and dimension– Frame.Flux.SIDim SI factor and dimension– Frame.Flux.UCD UCD of flux value (flux type)
(These apply only to the query response)(SIDim metadata still under construction)
NVO Summer School, Aspen 9-Sep-2005 40
Query Response
• Instrument Metadata– Instrument.Name Instrument name (identifier)– Instrument.Exposure Total exposure time (seconds)– Instrument.<other> Service-defined
• Notes– Optional; provided for instrumental data collections– In general, Collection, Bounds.Time, etc. are preferred– In general Instrument metadata is service-defined– Use Observation model as a starting point
NVO Summer School, Aspen 9-Sep-2005 41
Query Response
• Access Metadata– Access.Reference Data access URL– Access.Format MIME type of returned
dataset– Access.Size Approximate dataset size (bytes)– Access.Server Server endpoint URL
• Staging support goes here in the future– e.g., will dataset access require asynchronous staging– estimated cost to construct dataset
NVO Summer School, Aspen 9-Sep-2005 42
Service Metadata
• Usage– Describe service type and capabilities– Characterize service (data resources served, coverage, etc.)– Describe interface (optional query parameters)
• Interface– Requires new service metadata query method– Returns resource metadata descriptor (XML)
• Format– Registry resource descriptor (XML)
NVO Summer School, Aspen 9-Sep-2005 43
Data Retrieval
• Based on GET as with SIA– Variety of formats available– Compression supported
• Data representation– Data model defines logical content of data– The same data object may be represented
in various formats– Hence we need to specify both the data
model, and the file format
NVO Summer School, Aspen 9-Sep-2005 44
Data Retrieval
• Data models– SSA data model for fully-compliant data– Provider-defined data model for external data
• Data formats– VOTable (a container), native XML (direct serialization)– FITS binary table (another container; uses FITS spectral
WCS)– Text, e.g., CSV– Graphics (JPEG etc.)– text/html (rendered into browser page)